AI crawlers cause Wikimedia Commons bandwidth demands to surge 50%.
-
I still struggle with a use case for artificial intelligence in my own life. I play around with it all and I'm just like, it doesn't do a good job. Also, I think humanity is missing the plot, you know? Like, we don't need government. If government isn't going to do government. Government serves the people, not corporations. Or at least it should. I don't know, I think we're entering in times. At some point, I think people will pray for nuclear war, because life will be so miserable. That it would be better than just to end it all.
At some point, I think people will pray for nuclear war, because life will be so miserable.
Reminds me of Roll out the Fallout by The Chalkeaters
-
wikipedia should install ai mazes on their servers
Are there alternatives besides Cloudflare's solution?
-
So, uh. What about Lemmy?
They can also crawl this publically-accessible social media source for their data sets.
I'm on board with abandoning mainstream social media, but my point is that your suggestion would not solve the problem just relocate it. A better solution to the AI conglomerates stealing everyone's data from the open Internet is legislation and regulations - ie tackling the whole 'stealing data' component, along with stronger privacy regulations for everyone to make it harder for them to do the same in the future. It's nice seeing the EU taking some positive steps, but we will not see the US take any steps in that direction anytime soon, due to corporate capture of their politicians and the AI companies all being in the top 10 most wealthy companies in the US.
They can also crawl this publically-accessible social media source for their data sets.
Crawling would be silly. They can simply setup a lemmy node and subscribe to every other server. Activitypub crawler would be much more efficient as they wouldn't accidentally crawl things that haven't changed, but instead can read the activitypub updates.
-
Are there alternatives besides Cloudflare's solution?
Nepenthes does about the same thing but isn't managed by a corp.
-
This post did not contain any content.
Doesn't make any sense. Why would you crawl wikipedia when you can just download a dump as a torrent ?
-
I still struggle with a use case for artificial intelligence in my own life. I play around with it all and I'm just like, it doesn't do a good job. Also, I think humanity is missing the plot, you know? Like, we don't need government. If government isn't going to do government. Government serves the people, not corporations. Or at least it should. I don't know, I think we're entering in times. At some point, I think people will pray for nuclear war, because life will be so miserable. That it would be better than just to end it all.
Actual AI?
Imagine your phone knows that you have a business meeting downtown today. It's already reserved a parking space for you, set your car to warm up before you leave and looped your contact in on your ETA, along with automatically notifying you of any delays. Then, your kid wakes up this morning in with a horrible toothache, you ask your phone what to do and it rings up your family dentist, who has a full schedule today, but makes you a referral nearby. You agree to try that other dentist today, and your AI books an appointment, checks your meeting today, coordinates with their AIs and approves a 15 minute delay so you can get to the dentist. It also notifies your kid's school of their absence and has their teachers AI automatically queued up to send transcripts, notes and homework assignmenta from today's classes.
That's the kind of stuff actual AI can do. Overgrown autocorrect? It's basically a multi-billion dollar Magic Eightball.
-
The other day I tried to have it help me with a programming task on a personal project. I am an experienced programmer, but I only "get by" in Python (typically just by looking up the documentation for the standard library). I thought, "OK. This is it. I will ask Llama 3.3 and GPT4 for help."
That shit literally set me back a weekend. It gave me such bad approaches and answers, that I could tell were bad (aforementioned experience in programming, degree in comp sci, etc) that I got confused about writing Python. Had I just done what I usually do, which is to look up the documentation and use my brain, I would have gotten my weekend task done a whole weekend sooner.
It scares me to think what people are doing to themselves by relying on this, especially if they're novices.
It scares me to think what people are doing to themselves by relying on this, especially if they're novices.
Same here. There's a lot of denial going on but, LLMs are not good for anything that requires factual information. They likely will never be on account of just being statistical models for language. Summarizing long text where correctness isn't an issue is really one of the only places where I still think that they are good.
Search? Not if you want anything factual with citations.
Code? Fuck no. They constantly produce code of poor quality that may depend on non-existent libraries or functionality. More time it's spent debugging than writing code and it leaves the dev with a poor understanding of what the code actually does and ways to optimize/extend/etc.
Generating literary smut? Well, it's not going to do as good of a job as a person who can create something completely novel but can be passable without likely harm to authors (I'd classify it as a tier below erotic fan fiction).
-
I still struggle with a use case for artificial intelligence in my own life. I play around with it all and I'm just like, it doesn't do a good job. Also, I think humanity is missing the plot, you know? Like, we don't need government. If government isn't going to do government. Government serves the people, not corporations. Or at least it should. I don't know, I think we're entering in times. At some point, I think people will pray for nuclear war, because life will be so miserable. That it would be better than just to end it all.
AI has niches but they're exactly that: Niches. Small duct tape tasks for fudging over "hard problems" where manual code would result in a worse outcome and take far more time. Little esoteric problem spaces, which notably don't actually require you to use several states worth of electrical power training on a 50PB dataset of anime titties.
An example: I have a name generator in my game that strings together several consonant+vowel phoneme pairs into a name. This means that the names are always pronounceable, but often the spelling looks really unintuitive. Eg Joosiffe, which the player would likely pronounce as Joseph. However, the leap we do in our head between those two spellings is a process of declassifying phonemes and then re-classifying phonemes, and is actually a "hard problem" from a coding perspective due to the unintituive, multifarious complexities of written, spoken, and conceptualized human language. Adding this step to my name generator in code would be a project of it's own, larger than the game itself, and wouldn't ever work nearly as well as it needed to. But relatively small (30MB) AI models that do this with something like 99.8% satisfaction already exist. They didn't require a data center's worth of resources to train, and since they're academic projects they have licenses that allow them to be used for free in a game.
-
I’m dyslexic and basically a terrible writer. It has helped my professional communication develop. It really helps me speed up my issues with my disability and feel confident in my communications.
This is a cool use case. Just make sure you retain your own voice! If you read an AI-generated sentence out loud and think "I'd have said it this way instead", you should absolutely then change it to be that way.
-
Doesn't make any sense. Why would you crawl wikipedia when you can just download a dump as a torrent ?
AI bros aren't that smart.
-
The other day I tried to have it help me with a programming task on a personal project. I am an experienced programmer, but I only "get by" in Python (typically just by looking up the documentation for the standard library). I thought, "OK. This is it. I will ask Llama 3.3 and GPT4 for help."
That shit literally set me back a weekend. It gave me such bad approaches and answers, that I could tell were bad (aforementioned experience in programming, degree in comp sci, etc) that I got confused about writing Python. Had I just done what I usually do, which is to look up the documentation and use my brain, I would have gotten my weekend task done a whole weekend sooner.
It scares me to think what people are doing to themselves by relying on this, especially if they're novices.
We're going to be entering a golden age of hacks in the next 5 years, I'm calling it now. All this copy-pasted bad ChatGPT code is going to be used in ways that generate security holes the likes of which we've never seen before.
-
Life is "simpler" when you realize governments are just bullies that exercise their power through physical violence or economical violence.
Governments aren't fundamentally bad. Having a governing body along with laws and regulations is a good thing when done beneficently. For example, government is responsible for access to public education, libraries, banking, worker safety, and hospitals - all of which are objectively good things to have as a society. The problems usually occur when some individuals have more power/influence than others to choose what the government does, which is what's happening in much of the world right now.
-
wikipedia should install ai mazes on their servers
-
This post did not contain any content.
These fucking companies.. downing a torrent of annas archive but crawling wikipedia scourge of mankind
-
Doesn't make any sense. Why would you crawl wikipedia when you can just download a dump as a torrent ?
To have the most recent data?
-
An HTTP request is a request. Servers are free to rate limit or deny access
Bots lie about who they are, ignore robots.txt, and come from a gazillion different IPs.
-
This post did not contain any content.
Feel like this belongs in [email protected]
Think I should cross-post?
-
I don't want to ask ai. Google automatically gives me ai search results that are piss poor. Those useless results still use energy to generate.
Google automatically gives me ai search results that are piss poor.
And these results are taken at face value by a shocking number of people. I’ve gotten into niche academic arguments where someone just copy and pasted the AIs completed hallucinated response as “evidence.”
I experimented with using AI to generate basic quizzes for students on concepts like atomic theory or conservation of energy, but maybe 2/20 questions it came up with were any form of accurate/useful. Even when it’s not making shit up entirely, the information is so shallow as to be useless.
-
Feel like this belongs in [email protected]
Think I should cross-post?
Go on, my brother.