Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

agnos.is Forums

  1. Home
  2. Selfhosted
  3. Cloudflare blocking AI crawlers

Cloudflare blocking AI crawlers

Scheduled Pinned Locked Moved Selfhosted
selfhosted
34 Posts 24 Posters 0 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • andres4ny@social.ridetrans.itA [email protected]

    @irmadlad @lambalicious I just manually do the audio captcha. Every time. Because the picture captchas often don't work correctly for me.

    It does bug me a little that I don't know what the audio captcha is being used for - am I helping an amazon echo transcribe whatever it is surreptitiously listening to?

    irmadlad@lemmy.worldI This user is from outside of this forum
    irmadlad@lemmy.worldI This user is from outside of this forum
    [email protected]
    wrote on last edited by
    #13

    am I helping an amazon echo transcribe whatever it is surreptitiously listening to?

    I've always wondered where the hell they scrape all that audio from. I mean, it's random shit.

    L 1 Reply Last reply
    1
    • 3dcadmin@lemmy.relayeasy.com3 [email protected]

      Cloudflare trying to stop AI crawling somehow!

      https://arstechnica.com/tech-policy/2025/07/pay-up-or-stop-scraping-cloudflare-program-charges-bots-for-each-crawl/

      D This user is from outside of this forum
      D This user is from outside of this forum
      [email protected]
      wrote on last edited by
      #14

      How does it differentiate an "AI crawler", from any other crawler?
      Search engine crawler?
      Someone monitoring data to offer statistics?
      Archiving?

      This is not good. They are most likely doing the crawling themselves and them selling the data to the best bidder. That bidder could obviously be openAI for all we know.

      They just know that introducing the sentence "this is anti AI" a lot of people is not going to question anything.

      _cryptagion@lemmy.dbzer0.com_ 1 Reply Last reply
      7
      • 3dcadmin@lemmy.relayeasy.com3 [email protected]

        Cloudflare trying to stop AI crawling somehow!

        https://arstechnica.com/tech-policy/2025/07/pay-up-or-stop-scraping-cloudflare-program-charges-bots-for-each-crawl/

        E This user is from outside of this forum
        E This user is from outside of this forum
        [email protected]
        wrote on last edited by
        #15

        All this discussion about captchas raises a question for me: if fingerprinting is so accurate and easy, that ublock, no cookies and a VPN don't help... then why the fuck do I have to keep doing captchas?

        D I 2 Replies Last reply
        28
        • 3dcadmin@lemmy.relayeasy.com3 [email protected]

          Cloudflare trying to stop AI crawling somehow!

          https://arstechnica.com/tech-policy/2025/07/pay-up-or-stop-scraping-cloudflare-program-charges-bots-for-each-crawl/

          deebster@infosec.pubD This user is from outside of this forum
          deebster@infosec.pubD This user is from outside of this forum
          [email protected]
          wrote on last edited by
          #16

          FYI, you've added a link where the label is the URL and the actual link is empty. You can fix this by removing the [ and ]() around the link. If the link is there as plain text, it gets a hyperlink automatically: https://arstechnica.com/tech-policy/2025/07/pay-up-or-stop-scraping-cloudflare-program-charges-bots-for-each-crawl/

          3dcadmin@lemmy.relayeasy.com3 1 Reply Last reply
          2
          • E [email protected]

            All this discussion about captchas raises a question for me: if fingerprinting is so accurate and easy, that ublock, no cookies and a VPN don't help... then why the fuck do I have to keep doing captchas?

            D This user is from outside of this forum
            D This user is from outside of this forum
            [email protected]
            wrote on last edited by [email protected]
            #17

            Because it never was about security. You're training LLMs for free.

            I'm pretty sure some auto drive company is getting the advantage since a lot of captchas are spotting crosswalks, traffic lights, stairs, busses, mountains, motorcycles etc. Wonder if it's fucking tesla

            irmadlad@lemmy.worldI 1 Reply Last reply
            38
            • G [email protected]

              do they just want everything to be crawled

              Yes. Web crawling has been a normal and vital part of the web from day 1. We'd have no search engines without crawlers.

              The web is user-centric by design. I'm sick of tech companies trying to flip the script and hoard information, most of which is not theirs to begin with (e.g. Google, Reddit, Twitter, Facebook, etc.).

              P This user is from outside of this forum
              P This user is from outside of this forum
              [email protected]
              wrote on last edited by
              #18

              I don’t think this blocks crawlers. About 1/5 websites uses cloudflare, the significant thing here’s is that AI scraping is now blocked by default on most of those sites, NOT crawling

              1 Reply Last reply
              3
              • D [email protected]

                Because it never was about security. You're training LLMs for free.

                I'm pretty sure some auto drive company is getting the advantage since a lot of captchas are spotting crosswalks, traffic lights, stairs, busses, mountains, motorcycles etc. Wonder if it's fucking tesla

                irmadlad@lemmy.worldI This user is from outside of this forum
                irmadlad@lemmy.worldI This user is from outside of this forum
                [email protected]
                wrote on last edited by
                #19

                I’m pretty sure some auto drive company is getting the advantage

                I'd recon that a lot of that is spliced from pictures captured from Google Map vehicles.

                W 1 Reply Last reply
                6
                • D [email protected]

                  How does it differentiate an "AI crawler", from any other crawler?
                  Search engine crawler?
                  Someone monitoring data to offer statistics?
                  Archiving?

                  This is not good. They are most likely doing the crawling themselves and them selling the data to the best bidder. That bidder could obviously be openAI for all we know.

                  They just know that introducing the sentence "this is anti AI" a lot of people is not going to question anything.

                  _cryptagion@lemmy.dbzer0.com_ This user is from outside of this forum
                  _cryptagion@lemmy.dbzer0.com_ This user is from outside of this forum
                  [email protected]
                  wrote on last edited by
                  #20

                  Well, they have access to logs showing who connects to 24 million websites, how they use those websites, and for how long. So if there’s anyone who knows what traffic is crawlers, and which crawlers are AI, it’s Cloudflare. There’s no way they wouldn’t know, they have all the data they would ever need to figure it out. In fact, there’s nobody on the internet who is better positioned to be able to identify AI crawlers than Cloudflare.

                  x00z@lemmy.worldX 1 Reply Last reply
                  4
                  • teft@lemmy.worldT [email protected]

                    Seeing as how they can't reliably detect that I'm human or not, I don't have much confidence in this.

                    F This user is from outside of this forum
                    F This user is from outside of this forum
                    [email protected]
                    wrote on last edited by
                    #21

                    Yeah. Me choosing to use a vpn and a privacy respecting browser has earnt me a constant captcha

                    tywele@lemmy.dbzer0.comT 1 Reply Last reply
                    4
                    • irmadlad@lemmy.worldI [email protected]

                      am I helping an amazon echo transcribe whatever it is surreptitiously listening to?

                      I've always wondered where the hell they scrape all that audio from. I mean, it's random shit.

                      L This user is from outside of this forum
                      L This user is from outside of this forum
                      [email protected]
                      wrote on last edited by
                      #22

                      Gotta be physicists or fanfic writers. I can not imagine other better options.

                      irmadlad@lemmy.worldI 1 Reply Last reply
                      1
                      • irmadlad@lemmy.worldI [email protected]

                        I’m pretty sure some auto drive company is getting the advantage

                        I'd recon that a lot of that is spliced from pictures captured from Google Map vehicles.

                        W This user is from outside of this forum
                        W This user is from outside of this forum
                        [email protected]
                        wrote on last edited by [email protected]
                        #23

                        Both you and @[email protected] are correct. Google bought reCAPTCHA in 2012.

                        Here’s an article about it from 2018.

                        (╯°□°)╯︵ ┻━┻

                        Captcha if you can: how you’ve been training AI for years without realising it

                        And another from 2019! Captchas got harder for us because the AI had learned from our training.

                        Why CAPTCHAs have gotten so difficult

                        D irmadlad@lemmy.worldI 2 Replies Last reply
                        9
                        • W [email protected]

                          Both you and @[email protected] are correct. Google bought reCAPTCHA in 2012.

                          Here’s an article about it from 2018.

                          (╯°□°)╯︵ ┻━┻

                          Captcha if you can: how you’ve been training AI for years without realising it

                          And another from 2019! Captchas got harder for us because the AI had learned from our training.

                          Why CAPTCHAs have gotten so difficult

                          D This user is from outside of this forum
                          D This user is from outside of this forum
                          [email protected]
                          wrote on last edited by
                          #24

                          Fucking hell

                          1 Reply Last reply
                          0
                          • G [email protected]

                            Yeah, it's only anecdotal but I feel like hobbyists like us, who do slightly unusual things without nefarious intent, who are the ones who get hit with these sorts of issues the most. For example, I've noticed that some websites start throwing captchas at me or even just straight-up refuse to load with 403: unauthorized errors because I have my router set up to load-balance across two Internet connections. (At least, that's my guess as to why it's happening.)

                            S This user is from outside of this forum
                            S This user is from outside of this forum
                            [email protected]
                            wrote on last edited by
                            #25

                            Ahh yes. Imgur simply don't work anymore at my place, it always errors out with 403.

                            1 Reply Last reply
                            1
                            • F [email protected]

                              Yeah. Me choosing to use a vpn and a privacy respecting browser has earnt me a constant captcha

                              tywele@lemmy.dbzer0.comT This user is from outside of this forum
                              tywele@lemmy.dbzer0.comT This user is from outside of this forum
                              [email protected]
                              wrote on last edited by
                              #26

                              For me just using Firefox on Linux seems to be enough to trigger them.

                              A 1 Reply Last reply
                              4
                              • tuxenthusiast@sopuli.xyzT [email protected]

                                Anubis!
                                https://github.com/TecharoHQ/anubis

                                C This user is from outside of this forum
                                C This user is from outside of this forum
                                [email protected]
                                wrote on last edited by
                                #27

                                That uses proof of work rather than just detecting and blocking the bots.

                                1 Reply Last reply
                                0
                                • tywele@lemmy.dbzer0.comT [email protected]

                                  For me just using Firefox on Linux seems to be enough to trigger them.

                                  A This user is from outside of this forum
                                  A This user is from outside of this forum
                                  [email protected]
                                  wrote on last edited by
                                  #28

                                  Apple’s private relay does this too. And so does auto-login.

                                  1 Reply Last reply
                                  0
                                  • _cryptagion@lemmy.dbzer0.com_ [email protected]

                                    Well, they have access to logs showing who connects to 24 million websites, how they use those websites, and for how long. So if there’s anyone who knows what traffic is crawlers, and which crawlers are AI, it’s Cloudflare. There’s no way they wouldn’t know, they have all the data they would ever need to figure it out. In fact, there’s nobody on the internet who is better positioned to be able to identify AI crawlers than Cloudflare.

                                    x00z@lemmy.worldX This user is from outside of this forum
                                    x00z@lemmy.worldX This user is from outside of this forum
                                    [email protected]
                                    wrote on last edited by
                                    #29

                                    This.

                                    They also have a form to submit AI crawlers.

                                    CloudFlare can also easily maintain an anti AI crawler service completely by itself if it takes a fee on top of their pay per crawl functionality. However, considering CloudFlare already has all the tools and infrastructure to do this cheaply, providing a good service wouldn't be too hard.

                                    1 Reply Last reply
                                    3
                                    • deebster@infosec.pubD [email protected]

                                      FYI, you've added a link where the label is the URL and the actual link is empty. You can fix this by removing the [ and ]() around the link. If the link is there as plain text, it gets a hyperlink automatically: https://arstechnica.com/tech-policy/2025/07/pay-up-or-stop-scraping-cloudflare-program-charges-bots-for-each-crawl/

                                      3dcadmin@lemmy.relayeasy.com3 This user is from outside of this forum
                                      3dcadmin@lemmy.relayeasy.com3 This user is from outside of this forum
                                      [email protected]
                                      wrote on last edited by
                                      #30

                                      It was minding it's own business and adding them... lol
                                      😀

                                      1 Reply Last reply
                                      0
                                      • W [email protected]

                                        Both you and @[email protected] are correct. Google bought reCAPTCHA in 2012.

                                        Here’s an article about it from 2018.

                                        (╯°□°)╯︵ ┻━┻

                                        Captcha if you can: how you’ve been training AI for years without realising it

                                        And another from 2019! Captchas got harder for us because the AI had learned from our training.

                                        Why CAPTCHAs have gotten so difficult

                                        irmadlad@lemmy.worldI This user is from outside of this forum
                                        irmadlad@lemmy.worldI This user is from outside of this forum
                                        [email protected]
                                        wrote on last edited by
                                        #31

                                        A few years ago I picked up an online gig with a company that trained AI. You'd log in to your dashboard and be presented with questions you had to answer in the best way, such as 'Is the earth round?'. Well, it's round in nature but is not perfectly round. So you'd have to pick the best solution from the answer list. It was interesting, but tedious. It put taters on the table, so I got that going for me....which is nice.

                                        1 Reply Last reply
                                        1
                                        • L [email protected]

                                          Gotta be physicists or fanfic writers. I can not imagine other better options.

                                          irmadlad@lemmy.worldI This user is from outside of this forum
                                          irmadlad@lemmy.worldI This user is from outside of this forum
                                          [email protected]
                                          wrote on last edited by
                                          #32

                                          idk..Some of the stuff I've heard sounds like they eavesdropped in on a board room roundtable. Other stuff sounds like instructions how to install something. They probably are siphoning data off YT.

                                          L 1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups