The internet kind of sucks right now
-
Discord's complete lack of indexing. Although it's definitely not impossible to scrape data from Discord it would take more resources than say reddit.
If an AI company pays Discord they won't scrape but get the data directly.
-
I have never understood why people moved stuff to the closed Discord server system..
To build up a close (the two definitions of "close") community. To speak freely (even if you have to respect the TnCs of Discord + community guidelines.)
-
This post did not contain any content.
Self hosting is the way.
-
This post did not contain any content.
Got a lot "knowledge" AI companies are after?
-
This post did not contain any content.
The forum I call home tolerates a lot of hate speech.
I think I'm out, but it's less about the AI scraping and more about moderation.
-
This post did not contain any content.
Its been shit since covid. Everyone constantly online, and really ramping up the stupid as fuck culture wars. "Back in my day" I could log into a chat room, have some fun conversations, and then log off without getting pissed off or pissed on. I could look at movie news, and not be swapped by performative hate or praise for whatever fucking movie is or isnt "woke".
Everywhere you go, you see "Be civil" or "Be respectful". But all that really means is, dont question out echo chamber. And if you do, well, turns out not being civil towards you doesnt count.
Left and right doesnt matter. Its all hate and performative praise as far as the eye can see.
-
Self hosting is the way.
wrote last edited by [email protected]Doesn't really solve the AI scraping or the silo problem and as Codeberg found out recently, solving the AI scraping DDOS is never ending
-
If I'm going to share my information and knowledge publicly on an Internet site, I'd like everyone to have fair and open access to it, not at the whims of a multinational corp to gatekeep for me. So the fact that AI can access it too doesn't discourage me.
You have information from me because I choose to share it, not because a site has demanded I give it up without a clear benefit to me in return.
wrote last edited by [email protected]I think there's a lot of solid arguments against letting AI steal everything, but with the scraping there's an even more immediate problem. They don't rate limit or do it in an intelligent method. It becomes a full blown ddos that has take down entire sites and slowed many more to the point of near uselessness.
They're in a very literal sense crashing large chunks of the Internet and causing havoc which costs very real money to fix, either by upping server resources or installing AI scraping mitigation resources so that every still has access to the free information you mention.
-
(IANAL) Wouldn't this count as fair use since the AI sex bot is only using snippets?
That's currently being argued in the courts. There's a lot that goes into it from right to distribution, to proving that although the AI bot can't reproduce everything even though it normally doesn't.
[https://arstechnica.com/features/2025/06/study-metas-llama-3-1-can-recall-42-percent-of-the-first-harry-potter-book/](A very real example of reproducibility)There's also arguments about how they accessed large amounts of content. The law doesn't just recognize whether you can access something or not, but what you access it for. There's laws about accessing things with the sole purpose of using it to develop a commercial product. All of it is a tangled mess that there's no current clear answer to (legally, morally I think there is but that's very opinionated)
-
Depends on the forum, plenty of echo chambers out there...
A lot of the forums I'm seeing talked about where more technical or objective kinds. Like in a car forum there'd be repair manuals or parts lists, fountain pen forums would have loads of images comparing inks side by side for different shades and hues. Those are the sorts of knowledge centers being discussed and reminisced about a lot here.
-
I think there's a lot of solid arguments against letting AI steal everything, but with the scraping there's an even more immediate problem. They don't rate limit or do it in an intelligent method. It becomes a full blown ddos that has take down entire sites and slowed many more to the point of near uselessness.
They're in a very literal sense crashing large chunks of the Internet and causing havoc which costs very real money to fix, either by upping server resources or installing AI scraping mitigation resources so that every still has access to the free information you mention.
That is definitely a problem that needs to be dealt with, since AI scrapers hogging bandwidth or making sites inaccessible means it is hampering equal access to everyone. Ignoring conventions and not rate limiting itself are harmful to the open internet.
So yes, those kinds of AI scraping behaviours should be mitigated, but on the principle of AI ingesting my public data, I'm not against it, if it can access it reasonably and fairly like anyone else.
-
Then search engines wouldn't find anything from the forum either.
True... But what if the forum had its own search engine that could ignore the anti-scraping stuff? The issue would be making a good search engine lol
-
This post did not contain any content.
Is that what we're up against? I thought every time I voice my mind on forums it gets upvoted or downvoted or ignored, but always ultimately ignored
️.
-
This post did not contain any content.
What is making you stink discord isn't also selling all its data to AI companies?
-
This post did not contain any content.
Lol I knew this one year ago
-
People these days dont realise that confidently incorrect people pre-exist facebook.
It's different though.
If you were a flat earther in 1982, you probably would have a weird self published "newspaper" by someone 4 times a year, and two or three books and no platform beyond literally shouting on the street at people who all considered you a moron.
Nowadays, if you're a crackpot, you can instantly find 17.000 other crackpots who will happily not just confirm your idiocy, but make up fake stories to support your bullshit ideas. They will also drag you along by pure crank magnetism into other bullshit. You can spread your bullshit far and wide, and since people are automatically served with similar content, you're even likely to find other idiots like you "in the wild", which is actually an algorithmic bubble.
Before, nobody you met in real life would agree with you. Nowadays, everyone you "meet" online agrees with you.
So yes, confidently incorrect people have always been there, but not in these numbers, and rarely to this level of confidence. That's why people react to vehemently, they rarely ever reach outside their bubble. Your ideas that the world is round aren't the general concept to them, they hear from flat earthers every single hour of the day.
Nowadays, if you're a crackpot, you can instantly find 17.000 other crackpots who will happily not just confirm your idiocy, but make up fake stories to support your bullshit ideas.
And because crackpots like this are very engaged in their crackpottery, it's a great place to put ads. That means that the big Internet ad companies all want to be the ones to host those bullshit ideas.
Back in the day, the reason crackpot newspapers had to be self-published is that the big publishers didn't want to have anything to do with the crackpots. But, in the modern world, Google / Meta can find someone who wants to run an add to your crackpottery, so you get the same treatment as a big media publisher. In fact, you might get better treatment because crackpottery may be stickier than say the Boston Globe, so Google / Meta might prefer to work with you because it allows them to show more ads.
-
But they Index everything.
Just request your data and you’ll get a neat package of all your messages with timestamps and all.They store your data, they don't correlate your data.
-
They store your data, they don't correlate your data.
So what? You can still sell it to AI companies without assigning an user to each message. They don't care about who wrote it when stealing the content.
-
This post did not contain any content.
Let them scrape. AI as it currently is, is still autocomplete with extra steps, and still prone to hallucination. As it is it will be usable to make cheap, passable content, but not hit those moments of inspiration of human art (yet -- there are real AI groups looking to make AGI)
It is a bubble which will pop and AI will be seen as a tool (a resource-costly tool) that requires its own set of experts independent from the experts that use ACAD or write editorial copy or do investigative work. Id est, it's not the replacement of employees that boards of directors want it to be.
And AGI is centuries from being efficient enough that you can make Rosie the Robot who cleans your house and makes a good upside-down pineapple cake.
-
I don't have children. My legacy is running 37 Quora accounts that each answer niche questions very incorrectly, over and over.
The hero we don't deserve