AI crawlers cause Wikimedia Commons bandwidth demands to surge 50%.
-
I’m dyslexic and basically a terrible writer. It has helped my professional communication develop. It really helps me speed up my issues with my disability and feel confident in my communications.
This is a cool use case. Just make sure you retain your own voice! If you read an AI-generated sentence out loud and think "I'd have said it this way instead", you should absolutely then change it to be that way.
-
Doesn't make any sense. Why would you crawl wikipedia when you can just download a dump as a torrent ?
AI bros aren't that smart.
-
The other day I tried to have it help me with a programming task on a personal project. I am an experienced programmer, but I only "get by" in Python (typically just by looking up the documentation for the standard library). I thought, "OK. This is it. I will ask Llama 3.3 and GPT4 for help."
That shit literally set me back a weekend. It gave me such bad approaches and answers, that I could tell were bad (aforementioned experience in programming, degree in comp sci, etc) that I got confused about writing Python. Had I just done what I usually do, which is to look up the documentation and use my brain, I would have gotten my weekend task done a whole weekend sooner.
It scares me to think what people are doing to themselves by relying on this, especially if they're novices.
We're going to be entering a golden age of hacks in the next 5 years, I'm calling it now. All this copy-pasted bad ChatGPT code is going to be used in ways that generate security holes the likes of which we've never seen before.
-
Life is "simpler" when you realize governments are just bullies that exercise their power through physical violence or economical violence.
Governments aren't fundamentally bad. Having a governing body along with laws and regulations is a good thing when done beneficently. For example, government is responsible for access to public education, libraries, banking, worker safety, and hospitals - all of which are objectively good things to have as a society. The problems usually occur when some individuals have more power/influence than others to choose what the government does, which is what's happening in much of the world right now.
-
wikipedia should install ai mazes on their servers
-
This post did not contain any content.
These fucking companies.. downing a torrent of annas archive but crawling wikipedia scourge of mankind
-
Doesn't make any sense. Why would you crawl wikipedia when you can just download a dump as a torrent ?
To have the most recent data?
-
An HTTP request is a request. Servers are free to rate limit or deny access
Bots lie about who they are, ignore robots.txt, and come from a gazillion different IPs.
-
This post did not contain any content.
Feel like this belongs in [email protected]
Think I should cross-post?
-
I don't want to ask ai. Google automatically gives me ai search results that are piss poor. Those useless results still use energy to generate.
Google automatically gives me ai search results that are piss poor.
And these results are taken at face value by a shocking number of people. I’ve gotten into niche academic arguments where someone just copy and pasted the AIs completed hallucinated response as “evidence.”
I experimented with using AI to generate basic quizzes for students on concepts like atomic theory or conservation of energy, but maybe 2/20 questions it came up with were any form of accurate/useful. Even when it’s not making shit up entirely, the information is so shallow as to be useless.
-
Feel like this belongs in [email protected]
Think I should cross-post?
Go on, my brother.
-
Bots lie about who they are, ignore robots.txt, and come from a gazillion different IPs.
That's what ddos protection is for.
-
Not in this case, to be fair. The only concern is cost - since Wiki wouldn't be opposed to them getting their actual data - and AI mazes are designed to safeguard more sensitive data, not reducing cost
-
This post did not contain any content.
When I imagine a future with AI ruining the world, I always thought it was going to be some Skynet/CABAL/HAL9000 type of thing
Not this sad, boring, depressing type shit
-
To have the most recent data?
To just have the most recent data within reasonable time frame is one thing. AI companies are like "I must have every single article within 5 minutes they get updated, or I'll throw my pacifier out of the pram". No regard for the considerations of the source sites.
-
They can also crawl this publically-accessible social media source for their data sets.
Crawling would be silly. They can simply setup a lemmy node and subscribe to every other server. Activitypub crawler would be much more efficient as they wouldn't accidentally crawl things that haven't changed, but instead can read the activitypub updates.
Sure but we're in the comments section of an article about wikipedia being crawled, which is silly because they could just download a snapshot of wikipedia
-
Doesn't make any sense. Why would you crawl wikipedia when you can just download a dump as a torrent ?
Apparently the dump doesn't include media, though there's ongoing discussion within wikimedia about changing that. It also seems likely to me that AI scrapers don't care about externalizing costs onto others if it might mean a competitive advantage.
-
This is a cool use case. Just make sure you retain your own voice! If you read an AI-generated sentence out loud and think "I'd have said it this way instead", you should absolutely then change it to be that way.
Understood and I do. I try to tweak it a little to my own style. But it helps write the hundreds of cover letters I’m submitting a day. Looking for work. This usually took me hours for just one submission. Now I can fly through.