The Open-Source Software Saving the Internet From AI Bot Scrapers

[email protected]

No, it'd still be a problem; every diff between commits is expensive to render to web, even if "only one company" is scraping it, "only one time". Many of these applications are designed for humans, not scrapers.

[email protected]

The source I assume: challenges/metarefresh.

[email protected]

It’s not always about being first but about marketing.

And one has a cute catgirl mascot, the other a website that looks like a blockchain techbro startup.
I'm even willing to bet the amount of people that set up Anubis just to get the cute splash screen isn't insignificant.

[email protected]

adding to this, some sites set the difficulty way higher then others, nerdvpn's invidious and redlib instances take about 5 seconds and some ~20k hashes, while privacyredirect's inatances are almost instant with less then 50 hashes each time

[email protected]

it wasn't made for bitcoin originally? didn't know that!

[email protected]

That's not true, search indexer bots should be allowed through from what I read here.

[email protected]

To be honest, I need to ask my admin about that!

[email protected]

That's awful, it means I would get my photo id stolen hundreds of times per day, or there's also thisfacedoesntexists... and won't work. For many reasons. Not all websites require an account. And even those that do, when they ask for "personal verification" (like dating apps) have a hard time to implement just that. Most "serious" cases use human review of the photo and a video that has your face and you move in and out of an oval shape...

[email protected]

Also you must drink a verification can !

[email protected]

If you allow my searchxng search scraper then an AI scraper is indistinguishable.

If you mean, "google and duckduckgo are whitelisted" then lemmy will only be searchable there, those specific whitelisted hosts. And google search index is also an AI scraper bot.

[email protected]

If the rendering data for scraper was really the problem
Then the solution is simple, just have downloadable dumps of the publicly available information
That would be extremely efficient and cost fractions of pennies in monthly bandwidth
Plus the data would be far more usable for whatever they are using it for.

The problem is trying to have freely available data, but for the host to maintain the ability to leverage this data later.

I don't think we can have both of these.

[email protected]

Originally called hashcash: http://hashcash.org/

[email protected]

you know it's old when it doesn't have ssl

[email protected]

Ah so it is possible to change it

[email protected]

You can't freely download and edit society. You can download and edit this piece of software here, because this is FOSS. You could download it now and change it, or improve it however you'd like. But, you can't, because you're just pretending to be concerned about issues that are made up. Or, if being generous from what I can read here, only you have encountered.

[email protected]

It inherently blocks a lot of the simpler bots by requiring JavaScript as well.

[email protected]

So they make the internet worse for poor people? I could get through 20k in a second, but someone with just an old laptop would take a few minutes, no?

[email protected]

We don't use anubis but we use iocaine (?), see /0 for the announcement post

[email protected]

Well, it's the scrapers that are causing the problem.

[email protected]

Compare and contrast.

High-performance traffic management and next-gen security with multi-cloud management and observability. Built for the enterprise — open source at heart.

Sounds like some over priced, vacuous, do-everything solution. Looks and sounds like every other tech website. Looks like it is meant to appeal to the people who still say "cyber". Looks and sounds like fauxpen source.

Weigh the soul of incoming HTTP requests to protect your website!

Cute. Adorable. Baby girl. Protect my website. Looks fun. Has one clear goal.

agnos.is Forums

The Open-Source Software Saving the Internet From AI Bot Scrapers