Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

agnos.is Forums

  1. Home
  2. Selfhosted
  3. Cloudflare blocking AI crawlers

Cloudflare blocking AI crawlers

Scheduled Pinned Locked Moved Selfhosted
selfhosted
34 Posts 24 Posters 0 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • teft@lemmy.worldT [email protected]

    Seeing as how they can't reliably detect that I'm human or not, I don't have much confidence in this.

    F This user is from outside of this forum
    F This user is from outside of this forum
    [email protected]
    wrote on last edited by
    #21

    Yeah. Me choosing to use a vpn and a privacy respecting browser has earnt me a constant captcha

    tywele@lemmy.dbzer0.comT 1 Reply Last reply
    4
    • irmadlad@lemmy.worldI [email protected]

      am I helping an amazon echo transcribe whatever it is surreptitiously listening to?

      I've always wondered where the hell they scrape all that audio from. I mean, it's random shit.

      L This user is from outside of this forum
      L This user is from outside of this forum
      [email protected]
      wrote on last edited by
      #22

      Gotta be physicists or fanfic writers. I can not imagine other better options.

      irmadlad@lemmy.worldI 1 Reply Last reply
      1
      • irmadlad@lemmy.worldI [email protected]

        I’m pretty sure some auto drive company is getting the advantage

        I'd recon that a lot of that is spliced from pictures captured from Google Map vehicles.

        W This user is from outside of this forum
        W This user is from outside of this forum
        [email protected]
        wrote on last edited by [email protected]
        #23

        Both you and @[email protected] are correct. Google bought reCAPTCHA in 2012.

        Here’s an article about it from 2018.

        (╯°□°)╯︵ ┻━┻

        Captcha if you can: how you’ve been training AI for years without realising it

        And another from 2019! Captchas got harder for us because the AI had learned from our training.

        Why CAPTCHAs have gotten so difficult

        D irmadlad@lemmy.worldI 2 Replies Last reply
        9
        • W [email protected]

          Both you and @[email protected] are correct. Google bought reCAPTCHA in 2012.

          Here’s an article about it from 2018.

          (╯°□°)╯︵ ┻━┻

          Captcha if you can: how you’ve been training AI for years without realising it

          And another from 2019! Captchas got harder for us because the AI had learned from our training.

          Why CAPTCHAs have gotten so difficult

          D This user is from outside of this forum
          D This user is from outside of this forum
          [email protected]
          wrote on last edited by
          #24

          Fucking hell

          1 Reply Last reply
          0
          • G [email protected]

            Yeah, it's only anecdotal but I feel like hobbyists like us, who do slightly unusual things without nefarious intent, who are the ones who get hit with these sorts of issues the most. For example, I've noticed that some websites start throwing captchas at me or even just straight-up refuse to load with 403: unauthorized errors because I have my router set up to load-balance across two Internet connections. (At least, that's my guess as to why it's happening.)

            S This user is from outside of this forum
            S This user is from outside of this forum
            [email protected]
            wrote on last edited by
            #25

            Ahh yes. Imgur simply don't work anymore at my place, it always errors out with 403.

            1 Reply Last reply
            1
            • F [email protected]

              Yeah. Me choosing to use a vpn and a privacy respecting browser has earnt me a constant captcha

              tywele@lemmy.dbzer0.comT This user is from outside of this forum
              tywele@lemmy.dbzer0.comT This user is from outside of this forum
              [email protected]
              wrote on last edited by
              #26

              For me just using Firefox on Linux seems to be enough to trigger them.

              A 1 Reply Last reply
              4
              • tuxenthusiast@sopuli.xyzT [email protected]

                Anubis!
                https://github.com/TecharoHQ/anubis

                C This user is from outside of this forum
                C This user is from outside of this forum
                [email protected]
                wrote on last edited by
                #27

                That uses proof of work rather than just detecting and blocking the bots.

                1 Reply Last reply
                0
                • tywele@lemmy.dbzer0.comT [email protected]

                  For me just using Firefox on Linux seems to be enough to trigger them.

                  A This user is from outside of this forum
                  A This user is from outside of this forum
                  [email protected]
                  wrote on last edited by
                  #28

                  Apple’s private relay does this too. And so does auto-login.

                  1 Reply Last reply
                  0
                  • _cryptagion@lemmy.dbzer0.com_ [email protected]

                    Well, they have access to logs showing who connects to 24 million websites, how they use those websites, and for how long. So if there’s anyone who knows what traffic is crawlers, and which crawlers are AI, it’s Cloudflare. There’s no way they wouldn’t know, they have all the data they would ever need to figure it out. In fact, there’s nobody on the internet who is better positioned to be able to identify AI crawlers than Cloudflare.

                    x00z@lemmy.worldX This user is from outside of this forum
                    x00z@lemmy.worldX This user is from outside of this forum
                    [email protected]
                    wrote on last edited by
                    #29

                    This.

                    They also have a form to submit AI crawlers.

                    CloudFlare can also easily maintain an anti AI crawler service completely by itself if it takes a fee on top of their pay per crawl functionality. However, considering CloudFlare already has all the tools and infrastructure to do this cheaply, providing a good service wouldn't be too hard.

                    1 Reply Last reply
                    3
                    • deebster@infosec.pubD [email protected]

                      FYI, you've added a link where the label is the URL and the actual link is empty. You can fix this by removing the [ and ]() around the link. If the link is there as plain text, it gets a hyperlink automatically: https://arstechnica.com/tech-policy/2025/07/pay-up-or-stop-scraping-cloudflare-program-charges-bots-for-each-crawl/

                      3dcadmin@lemmy.relayeasy.com3 This user is from outside of this forum
                      3dcadmin@lemmy.relayeasy.com3 This user is from outside of this forum
                      [email protected]
                      wrote on last edited by
                      #30

                      It was minding it's own business and adding them... lol
                      😀

                      1 Reply Last reply
                      0
                      • W [email protected]

                        Both you and @[email protected] are correct. Google bought reCAPTCHA in 2012.

                        Here’s an article about it from 2018.

                        (╯°□°)╯︵ ┻━┻

                        Captcha if you can: how you’ve been training AI for years without realising it

                        And another from 2019! Captchas got harder for us because the AI had learned from our training.

                        Why CAPTCHAs have gotten so difficult

                        irmadlad@lemmy.worldI This user is from outside of this forum
                        irmadlad@lemmy.worldI This user is from outside of this forum
                        [email protected]
                        wrote on last edited by
                        #31

                        A few years ago I picked up an online gig with a company that trained AI. You'd log in to your dashboard and be presented with questions you had to answer in the best way, such as 'Is the earth round?'. Well, it's round in nature but is not perfectly round. So you'd have to pick the best solution from the answer list. It was interesting, but tedious. It put taters on the table, so I got that going for me....which is nice.

                        1 Reply Last reply
                        1
                        • L [email protected]

                          Gotta be physicists or fanfic writers. I can not imagine other better options.

                          irmadlad@lemmy.worldI This user is from outside of this forum
                          irmadlad@lemmy.worldI This user is from outside of this forum
                          [email protected]
                          wrote on last edited by
                          #32

                          idk..Some of the stuff I've heard sounds like they eavesdropped in on a board room roundtable. Other stuff sounds like instructions how to install something. They probably are siphoning data off YT.

                          L 1 Reply Last reply
                          0
                          • irmadlad@lemmy.worldI [email protected]

                            idk..Some of the stuff I've heard sounds like they eavesdropped in on a board room roundtable. Other stuff sounds like instructions how to install something. They probably are siphoning data off YT.

                            L This user is from outside of this forum
                            L This user is from outside of this forum
                            [email protected]
                            wrote on last edited by
                            #33

                            off YT

                            so... physicists and fanfic writers, yeah 😛

                            1 Reply Last reply
                            1
                            • E [email protected]

                              All this discussion about captchas raises a question for me: if fingerprinting is so accurate and easy, that ublock, no cookies and a VPN don't help... then why the fuck do I have to keep doing captchas?

                              I This user is from outside of this forum
                              I This user is from outside of this forum
                              [email protected]
                              wrote on last edited by
                              #34

                              To punish you for trying to protect yourself.
                              To extract micro-labour out of you (AI training)
                              To discourage you from privacy best practices
                              Oh BTW, the captcha will eventually contain unblockable ads

                              1 Reply Last reply
                              7
                              Reply
                              • Reply as topic
                              Log in to reply
                              • Oldest to Newest
                              • Newest to Oldest
                              • Most Votes


                              • Login

                              • Login or register to search.
                              • First post
                                Last post
                              0
                              • Categories
                              • Recent
                              • Tags
                              • Popular
                              • World
                              • Users
                              • Groups