AI crawlers cause Wikimedia Commons bandwidth demands to surge 50%.
-
Graphite is conductive. A short circuit and fire are Very Bad.
Couldn't you just use a charcoal pencil or crayon instead?
-
Couldn't you just use a charcoal pencil or crayon instead?
Yes, but neither of those write as cleanly. And both are still prone to fragmenting, even if the fragments aren't conductive.
-
Right‽ This is ridiculously stupid when you can download the entirety of Wikipedia in a single package and parse it to your hearts desire
Not only that, but we make it goddamn trivial for not just Wikipedia but for other Wikimedia projects. Doing this is just stealing without attribution like the CC BY-SA 4.0 license demands and then on top of that kicking down the ladder for people who actually want to use Wikipedia and not the hallucinatory slop you're trying to supplant it with. LLM companies have caused incalculable damage to critical thinking, the open web, and the climate.
-
AI: The "pen that can write in zero gravity" when pencils exist.
Pen, pencil. Both are dangerous in the wrong hands.
-
Well I get the analogy, but also I think they didn't use pencils because of the graphite and complications with filtering air or something.
Also, all pens work in zero gravity. They don't rely on gravity to feed ink to the point, try writing on a piece of paper that's held against the ceiling and it works just fine.
-
Laws should be passed in all countries that AI crawlers should request permission before crawling whatever target site. I haver no pity to AI "thiefs" that get their models poisoned. F...ing plague, wasn't enough the adware and spyware...
i doubt the recent uptick in traffic is from “stealing data” for training but rather from agents scraping them for context, eg Edge Copilot, Google’s AI search, SearchGPT, etc.
poisoning the data will likely not help in this situation since there’s a human on the other side that will just do the same search again given unsatisfactory results. like how retries and timeouts can cause huge outages for web scale companies, poisoning search results will likely cause this type of traffic to increase and further increase the chances of DoS and higher bandwidth usage.
-
Also, all pens work in zero gravity. They don't rely on gravity to feed ink to the point, try writing on a piece of paper that's held against the ceiling and it works just fine.
Just tried it, and the ink stopped. There's no wick in it and apparently any capillary action is stopped by gravity. It wrote for a little bit for as long as there was enough ink sticking to the ball, but that didn't last more than a few sentences.
In zero gravity, since there's no gravity pulling the ink in either direction, a typical ballpoint pen would likely write inconsistently as the ink shifts in the tube from inertial forces, like a pen that's drying out.
-
This post did not contain any content.
To be clear, network costs represent a tiny fraction of WMF's expenses. Much of WMF's budget goes to social programs, not technical upkeep.
-
AI: The "pen that can write in zero gravity" when pencils exist.
This pen / pencil thing has been corrected so many times for so many decades that it's ludicrous people are still bringing it up.
https://www.scientificamerican.com/article/fact-or-fiction-nasa-spen/
Random bits of pencil lead floating around in a high tech environment is such a poor idea that even the Soviet's quit using pencils once Fisher's Space Pen was available. A pen which Fisher itself paid to develop and then sold to both NASA and the Soviet Space Program.
-
Right‽ This is ridiculously stupid when you can download the entirety of Wikipedia in a single package and parse it to your hearts desire
-
i doubt the recent uptick in traffic is from “stealing data” for training but rather from agents scraping them for context, eg Edge Copilot, Google’s AI search, SearchGPT, etc.
poisoning the data will likely not help in this situation since there’s a human on the other side that will just do the same search again given unsatisfactory results. like how retries and timeouts can cause huge outages for web scale companies, poisoning search results will likely cause this type of traffic to increase and further increase the chances of DoS and higher bandwidth usage.
So? Break context scrapers till they give up, on your site or completely.
-
An HTTP request is a request. Servers are free to rate limit or deny access
Rate limiting in itself requires resources that are not always available. For one thing you can only rate limit individuals you can identify so you need to keep data about past requests in memory and attach counters to them and even then that won't help if the requests come from IPs that are easily changed.
-
So? Break context scrapers till they give up, on your site or completely.
easily said
-
This post did not contain any content.
And the quality of the AI output sucks. I was recently looking for information about positive convention for yaw, pitch, and roll in aircraft. I was looking at az and yaw and got reasonable results from the AI, but when I looked at pitch and el all of the results were about elevator pitches. Even when I spelled out elevation it insisted on elevator pitches. I scroll past the AI results as a matter of principle, but I usually look at them so I have something specific to complain about when people ask why I am so virulently anti-AI.
-
This pen / pencil thing has been corrected so many times for so many decades that it's ludicrous people are still bringing it up.
https://www.scientificamerican.com/article/fact-or-fiction-nasa-spen/
Random bits of pencil lead floating around in a high tech environment is such a poor idea that even the Soviet's quit using pencils once Fisher's Space Pen was available. A pen which Fisher itself paid to develop and then sold to both NASA and the Soviet Space Program.
Yeah, I know it's not precisely correct, but it's a fable that's commonly understood as an example of over-engineering. I'm open to better and more factual examples, if you have any!
-
And the quality of the AI output sucks. I was recently looking for information about positive convention for yaw, pitch, and roll in aircraft. I was looking at az and yaw and got reasonable results from the AI, but when I looked at pitch and el all of the results were about elevator pitches. Even when I spelled out elevation it insisted on elevator pitches. I scroll past the AI results as a matter of principle, but I usually look at them so I have something specific to complain about when people ask why I am so virulently anti-AI.
AI is useful for basic, mundane tasks and that's about it. Trying to force it to be some sort of Uber search engine is such a bad idea.
-
And the quality of the AI output sucks. I was recently looking for information about positive convention for yaw, pitch, and roll in aircraft. I was looking at az and yaw and got reasonable results from the AI, but when I looked at pitch and el all of the results were about elevator pitches. Even when I spelled out elevation it insisted on elevator pitches. I scroll past the AI results as a matter of principle, but I usually look at them so I have something specific to complain about when people ask why I am so virulently anti-AI.
Yea that’s a bad example of what to use ai for at least right now. You’re going to get bad results with that question.
It’s good for things, if you pay.
-
Yea that’s a bad example of what to use ai for at least right now. You’re going to get bad results with that question.
It’s good for things, if you pay.
I don't want to ask ai. Google automatically gives me ai search results that are piss poor. Those useless results still use energy to generate.
-
And the quality of the AI output sucks. I was recently looking for information about positive convention for yaw, pitch, and roll in aircraft. I was looking at az and yaw and got reasonable results from the AI, but when I looked at pitch and el all of the results were about elevator pitches. Even when I spelled out elevation it insisted on elevator pitches. I scroll past the AI results as a matter of principle, but I usually look at them so I have something specific to complain about when people ask why I am so virulently anti-AI.
I recently started as a graphic designer despite knowing absolutely nothing about it, so i am constantly searching how to do stuff in Adobe suite at work. Half the time Google's AI can't even keep "Cmnd" and "ctrl" straight, telling me to use' "cmnd+shift+H" on Windows or "ctrl+shift+H" on Mac'. I don't even know how it botches that, but it does it about 25% of the time.
-
what assholes .. just fucking download the full package and quit hitting the URL
If I was running infra for them, I’d just start blacklisting abusive IPs without warning