The Open-Source Software Saving the Internet From AI Bot Scrapers
-
I don't think this would help:
By photo ID, I don't mean just any photo, I mean "photo id" cryptographically signed by the state, certificates checked, database pinged, identity validated, the whole enchilada
-
"Yes", for any bits the user sees. The frontend UI can be behind Anubis without issues. The API, including both user and federation, cannot. We expect "bots" to use an API, so you can't put human verification in front of it. These "bots* also include applications that aren't aware of Anubis, or unable to pass it, like all third party Lemmy apps.
That does stop almost all generic AI scraping, though it does not prevent targeted abuse.
The API, including both user and federation, cannot.
This is theoretically an issue however in practice Anubis only weighs requests that appear to come from a browser: https://anubis.techaro.lol/docs/design/how-anubis-works
I just tested my instance with Jerboa and it seems to work just fine.
-
It's funby that older captchas could be viewed as proof of work algorithms now because image recognition is so good. (From using captchas.)
wrote last edited by [email protected]Interesting stance. I have bought many tens of thousand of captcha soves for legitimate reasons, and I have now completely lost faith in them
-
So they make the internet worse for poor people? I could get through 20k in a second, but someone with just an old laptop would take a few minutes, no?
Isn't that just the way things work in general though? If you have a worse computer, everything is going to be slower, broadly speaking.
-
I've seen this pop up on websites a lot lately. Usually it takes a few seconds to load the website but there have been occasions where it seemed to hang as it was stuck on that screen for minutes and I ended up closing my browser tab because the website just wouldn't load.
Is this a (known) issue or is it intended to be like this?
I have had a similar experience. Most sites with Anubis take only a few seconds to go through, but I ran into I think it was some small blog where it took at least 5 minutes. Like someone mentioned, it may have been how they set it up with number of hashes required. The site that took forever for me seemed to have some exorbitant number like 5k or 50k (I don't recall exactly).
-
I don't understand how/why this got so popular out of nowhere... the same solution has already existed for years in the form of haproxy-protection and a couple others... but nobody seems to care about those.
Probably a similar reason as to why we don't hear about the other potential hundreds of competing products or solutions to the same problem (in general).
Luck.
It's just not fair in our world.
-
So they make the internet worse for poor people? I could get through 20k in a second, but someone with just an old laptop would take a few minutes, no?
Just wait till they hit my homepage with a 200mb react frontend, 9 seperate tracking / analytics scripts and generic shopify scripts on it
-
I get that website admins are desperate for a solution, but Anubis is fundamentally flawed.
It is hostile to the user, because it is very slow on older hardware andere forces you to use javascript.
It is bad for the environment, because it wastes energy on useless computations similar to mining crypto. If more websites start using this, that really adds up.
But most importantly, it won't work in the end. These scraping tech companies have much deeper pockets and can use specialized hardware that is much more efficient at solving these challenges than a normal web browser.
I don't like it either because my prefered way to use the web is either through the terminal or a very stripped down browser. I HATE tracking and JS
-
By photo ID, I don't mean just any photo, I mean "photo id" cryptographically signed by the state, certificates checked, database pinged, identity validated, the whole enchilada
That would have the same effect as just taking the site offline...
No one is giving a random site their photo ID.
-
That would have the same effect as just taking the site offline...
No one is giving a random site their photo ID.
You'd be surprised, many humans have simply no backbone, common sense nor self respect so I think they very probably would still, in large numbers. Proof is facebook and palantir.