The Open-Source Software Saving the Internet From AI Bot Scrapers

[email protected]

Scrapers can send these challenges off to dedicated GPU farms or even FPGAs, which are an order of magnitude faster and more efficient.

Lets assume for the sake of argument, an AI scraper company actually attempted this. They don't, but lets assume it anyway.

The next Anubis release could include (for example), SHA256 instead of SHA1. This would be a simple, and basically transparent update for admins and end users. The AI company that invested into offloading the PoW to somewhere more efficient now has to spend significantly more resources changing their implementation than what it took for the devs and users of Anubis.

Yes, it technically remains a game of "cat and mouse", but heavily stacked against the cat. One step for Anubis is 2000 steps for a company reimplementing its client in more efficient hardware. Most of the Anubis changes can even be done without impacting the end users at all. That's a game AI companies aren't willing to play, because they've basically already lost. It doesn't really matter how "efficient" the implementation is, if it can be rendered unusable by a small Anubis update.

[email protected]

I don't understand how/why this got so popular out of nowhere... the same solution has already existed for years in the form of haproxy-protection and a couple others... but nobody seems to care about those.

[email protected]

Probably because the creator had a blog post that got shared around at a point in time where this exact problem was resonating with users.

It's not always about being first but about marketing.

[email protected]

Youre more than welcome to try and implement something better.

[email protected]

"You criticize society yet you participate in it. Curious."

[email protected]

I've seen this pop up on websites a lot lately. Usually it takes a few seconds to load the website but there have been occasions where it seemed to hang as it was stuck on that screen for minutes and I ended up closing my browser tab because the website just wouldn't load.

Is this a (known) issue or is it intended to be like this?

[email protected]

If you're deliberately belittling me I won't engage. Goodbye.

[email protected]

anubis is basically a bitcoin miner, with the difficulty turned way down (and obviously not resulting in any coins), so it's inherently random. if it takes minutes it does seem like something is wrong though. maybe a network error?

[email protected]

then have them pay for it.

[email protected]

Open source is also the AI scraper bots AND the internet itself, it is every character in the story.

[email protected]

Yes, it would make lemmy as unsearchable as discord. Instead of unsearchable as pinterest.

[email protected]

Support, pay, and get it

[email protected]

All this could be avoided by making submit photo id to login into a account.

[email protected]

This would not be a problem if one bot scraped once, and the result was then mirrored to all on Big Tech's dime (cloudflare, tailscale) but since they are all competing now, they think their edge is going to be their own more better scraper setup and they won't share.

Maybe there should just be a web to torrent bridge sovtge data is pushed out once by the server and tge swarm does the heavy lifting as a cache.

[email protected]

No, it'd still be a problem; every diff between commits is expensive to render to web, even if "only one company" is scraping it, "only one time". Many of these applications are designed for humans, not scrapers.

[email protected]

The source I assume: challenges/metarefresh.

[email protected]

It’s not always about being first but about marketing.

And one has a cute catgirl mascot, the other a website that looks like a blockchain techbro startup.
I'm even willing to bet the amount of people that set up Anubis just to get the cute splash screen isn't insignificant.

[email protected]

adding to this, some sites set the difficulty way higher then others, nerdvpn's invidious and redlib instances take about 5 seconds and some ~20k hashes, while privacyredirect's inatances are almost instant with less then 50 hashes each time

[email protected]

it wasn't made for bitcoin originally? didn't know that!

[email protected]

That's not true, search indexer bots should be allowed through from what I read here.

agnos.is Forums

The Open-Source Software Saving the Internet From AI Bot Scrapers