Cloudflare blocking AI crawlers
-
am I helping an amazon echo transcribe whatever it is surreptitiously listening to?
I've always wondered where the hell they scrape all that audio from. I mean, it's random shit.
Gotta be physicists or fanfic writers. I can not imagine other better options.
-
I’m pretty sure some auto drive company is getting the advantage
I'd recon that a lot of that is spliced from pictures captured from Google Map vehicles.
wrote on last edited by [email protected]Both you and @[email protected] are correct. Google bought reCAPTCHA in 2012.
Here’s an article about it from 2018.
(╯°□°)╯︵ ┻━┻
Captcha if you can: how you’ve been training AI for years without realising it
And another from 2019! Captchas got harder for us because the AI had learned from our training.
-
Both you and @[email protected] are correct. Google bought reCAPTCHA in 2012.
Here’s an article about it from 2018.
(╯°□°)╯︵ ┻━┻
Captcha if you can: how you’ve been training AI for years without realising it
And another from 2019! Captchas got harder for us because the AI had learned from our training.
Fucking hell
-
Yeah, it's only anecdotal but I feel like hobbyists like us, who do slightly unusual things without nefarious intent, who are the ones who get hit with these sorts of issues the most. For example, I've noticed that some websites start throwing captchas at me or even just straight-up refuse to load with 403: unauthorized errors because I have my router set up to load-balance across two Internet connections. (At least, that's my guess as to why it's happening.)
Ahh yes. Imgur simply don't work anymore at my place, it always errors out with 403.
-
Yeah. Me choosing to use a vpn and a privacy respecting browser has earnt me a constant captcha
For me just using Firefox on Linux seems to be enough to trigger them.
-
That uses proof of work rather than just detecting and blocking the bots.
-
For me just using Firefox on Linux seems to be enough to trigger them.
Apple’s private relay does this too. And so does auto-login.
-
Well, they have access to logs showing who connects to 24 million websites, how they use those websites, and for how long. So if there’s anyone who knows what traffic is crawlers, and which crawlers are AI, it’s Cloudflare. There’s no way they wouldn’t know, they have all the data they would ever need to figure it out. In fact, there’s nobody on the internet who is better positioned to be able to identify AI crawlers than Cloudflare.
This.
They also have a form to submit AI crawlers.
CloudFlare can also easily maintain an anti AI crawler service completely by itself if it takes a fee on top of their pay per crawl functionality. However, considering CloudFlare already has all the tools and infrastructure to do this cheaply, providing a good service wouldn't be too hard.
-
FYI, you've added a link where the label is the URL and the actual link is empty. You can fix this by removing the
[
and]()
around the link. If the link is there as plain text, it gets a hyperlink automatically: https://arstechnica.com/tech-policy/2025/07/pay-up-or-stop-scraping-cloudflare-program-charges-bots-for-each-crawl/It was minding it's own business and adding them... lol
-
Both you and @[email protected] are correct. Google bought reCAPTCHA in 2012.
Here’s an article about it from 2018.
(╯°□°)╯︵ ┻━┻
Captcha if you can: how you’ve been training AI for years without realising it
And another from 2019! Captchas got harder for us because the AI had learned from our training.
A few years ago I picked up an online gig with a company that trained AI. You'd log in to your dashboard and be presented with questions you had to answer in the best way, such as 'Is the earth round?'. Well, it's round in nature but is not perfectly round. So you'd have to pick the best solution from the answer list. It was interesting, but tedious. It put taters on the table, so I got that going for me....which is nice.
-
Gotta be physicists or fanfic writers. I can not imagine other better options.
idk..Some of the stuff I've heard sounds like they eavesdropped in on a board room roundtable. Other stuff sounds like instructions how to install something. They probably are siphoning data off YT.
-
idk..Some of the stuff I've heard sounds like they eavesdropped in on a board room roundtable. Other stuff sounds like instructions how to install something. They probably are siphoning data off YT.
off YT
so... physicists and fanfic writers, yeah
-
All this discussion about captchas raises a question for me: if fingerprinting is so accurate and easy, that ublock, no cookies and a VPN don't help... then why the fuck do I have to keep doing captchas?
To punish you for trying to protect yourself.
To extract micro-labour out of you (AI training)
To discourage you from privacy best practices
Oh BTW, the captcha will eventually contain unblockable ads