lads
-
Like I said, [edit: at one point] Facebook requested my robots.txt multiple times a second. You've not convinced me that bot writers care about efficiency.
[edit: they've since stopped, possibly because now I give a 404 to anything claiming to be from facebook]
You've not convinced me that bot writers care about efficiency.
and why should bot writers care about efficiency when what they really care about is time. they'll burn all your resources without regard simply because they're not who's paying
-
You've not convinced me that bot writers care about efficiency.
and why should bot writers care about efficiency when what they really care about is time. they'll burn all your resources without regard simply because they're not who's paying
Yep, they'll just burn taxpayer resources (me and my poor servers) because it's not like they pay taxes anyway (assuming they are either a corporation or not based in the same locality as I am).
There's only one of me and if I'm working on keeping the servers bare minimum functional today I'm not working on making something more awesome for tomorrow. "Linux sysadmin" is only supposed to be up to 30% of my job.
-
Yep, they'll just burn taxpayer resources (me and my poor servers) because it's not like they pay taxes anyway (assuming they are either a corporation or not based in the same locality as I am).
There's only one of me and if I'm working on keeping the servers bare minimum functional today I'm not working on making something more awesome for tomorrow. "Linux sysadmin" is only supposed to be up to 30% of my job.
wrote last edited by [email protected]I mean, I enjoy linux sysadmining, but fighting bots takes time, experimentation, and research, and there's other stuff I should be doing. For example, accessibility updates to our websites. But, accessibility doesn't matter a lick if you can't access the website anyway due to timeouts.
-
There's heavy, and then there's heavy. I don't have any experience dealing with threats like this myself, so I can't comment on what's most common, but we're talking about potentially millions of times more resources for the attacker than the defender here.
There is a lot of AI hype and AI anti-hype right now, that's true.
I do. I have a client with a limited budget whose websites I'm considering putting behind Anubis because it's getting hammered by AI scrapers.
It comes in waves, too, so the website may randomly go down or slow down significantly, which is really annoying because it's unpredictable.
-
It's very intrusive in the sense that it runs a PoW challenge, unsolicited on the client. That's literally like having a cryptominer running on your computer for each challenge.
Each one would do what they want with their server, of course. But for instance I'm very fond of scraping. For instance I have FreshRSS running ok my server, and the way it works is that when the target website doesn't provide a RSS feed ot scrapes it to get the articles. I also have other service that scrapes to get pages changes.
I think part of the beauty of internet is being able to automate processes, software lile Anubis puts a globally significant energy tax on theses automations.
Once again, each one it's able to do with their server whatever they want. But the think I like the least is that they are targeting with some great PR their software as part of some great anti-AI crusade, I don't know if the devs itself or any other party. And I don't like this mostly because I think is disinformation and just manipulative towards people who is maybe easy to manipulate if you say the right words. I also think that it's a discourse that pushes into radicalization from certain topic, and I'm a firm believer that right now we need to overall reduce radicalization, not increase it.
wrote last edited by [email protected]A proof of work challenge is infinitely better than the alternative of "fuck you, you're accessing this through a VPN and the IP is banned for being owned by Amazon (or literally any data center)"
-
Hail Anubis-chan.
any "bot stopper" ends up stopping me somehow. Including anubis. I'm pretty sure ive been cursed by the rng gods because even at 40 KH/s, I get stuck on the pages for like 2 minutes before it tells me success.
Similar things like hcaptcha or cloudflare turnstile either never load or never succeed. Recaptcha gaslights me into thinking I was wrong.
-
Correct. Anubis' goal is to decrease the web traffic that hits the server, not to prevent scraping altogether. I should also clarify that this works because it costs the scrapers time with each request, not because it bogs down the CPU.
Why not then just make it a setTimeout or something so that it doesn't nuke the CPU of old devices?
-
This post did not contain any content.
The block underneath ai is python
-
Why not then just make it a setTimeout or something so that it doesn't nuke the CPU of old devices?
Crawlers don't have to follow conventions or specifications. If one has a
setTimeout
implementation that doesn't wait the specified amount of time and simply executes the callback immediately, it defeats the system. Proof-of-work is meant to ensure that it's impossible to get around the time factor because of computational inefficiency.Anubis is an emergency solution against the flood of scrapers deployed by massive AI companies. Everybody wishes it wasn't necessary.
-
any "bot stopper" ends up stopping me somehow. Including anubis. I'm pretty sure ive been cursed by the rng gods because even at 40 KH/s, I get stuck on the pages for like 2 minutes before it tells me success.
Similar things like hcaptcha or cloudflare turnstile either never load or never succeed. Recaptcha gaslights me into thinking I was wrong.
Have you ever had a bladerunner moment