Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

agnos.is Forums

  1. Home
  2. Open Source
  3. The Open-Source Software Saving the Internet From AI Bot Scrapers

The Open-Source Software Saving the Internet From AI Bot Scrapers

Scheduled Pinned Locked Moved Open Source
opensource
102 Posts 65 Posters 1 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • bdonvr@thelemmy.clubB [email protected]

    That's not true, search indexer bots should be allowed through from what I read here.

    I This user is from outside of this forum
    I This user is from outside of this forum
    [email protected]
    wrote last edited by
    #75

    If you allow my searchxng search scraper then an AI scraper is indistinguishable.

    If you mean, "google and duckduckgo are whitelisted" then lemmy will only be searchable there, those specific whitelisted hosts. And google search index is also an AI scraper bot.

    1 Reply Last reply
    4
    • D [email protected]

      No, it'd still be a problem; every diff between commits is expensive to render to web, even if "only one company" is scraping it, "only one time". Many of these applications are designed for humans, not scrapers.

      I This user is from outside of this forum
      I This user is from outside of this forum
      [email protected]
      wrote last edited by
      #76

      If the rendering data for scraper was really the problem
      Then the solution is simple, just have downloadable dumps of the publicly available information
      That would be extremely efficient and cost fractions of pennies in monthly bandwidth
      Plus the data would be far more usable for whatever they are using it for.

      The problem is trying to have freely available data, but for the host to maintain the ability to leverage this data later.

      I don't think we can have both of these.

      1 Reply Last reply
      0
      • R [email protected]

        it wasn't made for bitcoin originally? didn't know that!

        0 This user is from outside of this forum
        0 This user is from outside of this forum
        [email protected]
        wrote last edited by
        #77

        Originally called hashcash: http://hashcash.org/

        R 1 Reply Last reply
        10
        • 0 [email protected]

          Originally called hashcash: http://hashcash.org/

          R This user is from outside of this forum
          R This user is from outside of this forum
          [email protected]
          wrote last edited by
          #78

          you know it's old when it doesn't have ssl

          1 Reply Last reply
          8
          • phase@lemmy.8th.worldP [email protected]

            Support, pay, and get it 🙂

            F This user is from outside of this forum
            F This user is from outside of this forum
            [email protected]
            wrote last edited by
            #79

            Ah so it is possible to change it

            1 Reply Last reply
            1
            • K [email protected]

              "You criticize society yet you participate in it. Curious."

              B This user is from outside of this forum
              B This user is from outside of this forum
              [email protected]
              wrote last edited by [email protected]
              #80

              You can't freely download and edit society. You can download and edit this piece of software here, because this is FOSS. You could download it now and change it, or improve it however you'd like. But, you can't, because you're just pretending to be concerned about issues that are made up. Or, if being generous from what I can read here, only you have encountered.

              1 Reply Last reply
              2
              • U [email protected]

                Non paywalled link https://archive.is/VcoE1

                It basically boils down to making the browser do some cpu heavy calculations before allowing access. This is no problem for a single user, but for a bot farm this would increase the amount of compute power they need 100x or more.

                exu@feditown.comE This user is from outside of this forum
                exu@feditown.comE This user is from outside of this forum
                [email protected]
                wrote last edited by
                #81

                It inherently blocks a lot of the simpler bots by requiring JavaScript as well.

                1 Reply Last reply
                8
                • isolatedscotch@discuss.tchncs.deI [email protected]

                  adding to this, some sites set the difficulty way higher then others, nerdvpn's invidious and redlib instances take about 5 seconds and some ~20k hashes, while privacyredirect's inatances are almost instant with less then 50 hashes each time

                  repletelocum@lemmy.blahaj.zoneR This user is from outside of this forum
                  repletelocum@lemmy.blahaj.zoneR This user is from outside of this forum
                  [email protected]
                  wrote last edited by
                  #82

                  So they make the internet worse for poor people? I could get through 20k in a second, but someone with just an old laptop would take a few minutes, no?

                  jackbydev@programming.devJ isolatedscotch@discuss.tchncs.deI C M 4 Replies Last reply
                  6
                  • R [email protected]

                    To be honest, I need to ask my admin about that!

                    fxomt@lemmy.dbzer0.comF This user is from outside of this forum
                    fxomt@lemmy.dbzer0.comF This user is from outside of this forum
                    [email protected]
                    wrote last edited by [email protected]
                    #83

                    We don't use anubis but we use iocaine (?), see /0 for the announcement post

                    1 Reply Last reply
                    2
                    • repletelocum@lemmy.blahaj.zoneR [email protected]

                      So they make the internet worse for poor people? I could get through 20k in a second, but someone with just an old laptop would take a few minutes, no?

                      jackbydev@programming.devJ This user is from outside of this forum
                      jackbydev@programming.devJ This user is from outside of this forum
                      [email protected]
                      wrote last edited by
                      #84

                      Well, it's the scrapers that are causing the problem.

                      1 Reply Last reply
                      6
                      • johnedwa@sopuli.xyzJ [email protected]

                        It’s not always about being first but about marketing.

                        And one has a cute catgirl mascot, the other a website that looks like a blockchain techbro startup.
                        I'm even willing to bet the amount of people that set up Anubis just to get the cute splash screen isn't insignificant.

                        jackbydev@programming.devJ This user is from outside of this forum
                        jackbydev@programming.devJ This user is from outside of this forum
                        [email protected]
                        wrote last edited by
                        #85

                        Compare and contrast.

                        High-performance traffic management and next-gen security with multi-cloud management and observability. Built for the enterprise — open source at heart.

                        Sounds like some over priced, vacuous, do-everything solution. Looks and sounds like every other tech website. Looks like it is meant to appeal to the people who still say "cyber". Looks and sounds like fauxpen source.

                        Weigh the soul of incoming HTTP requests to protect your website!

                        Cute. Adorable. Baby girl. Protect my website. Looks fun. Has one clear goal.

                        1 Reply Last reply
                        12
                        • mubelotix@jlai.luM [email protected]

                          Exactly. It's called proof-of-work and was originally invented to reduce spam emails but was later used by Bitcoin to control its growth speed

                          jackbydev@programming.devJ This user is from outside of this forum
                          jackbydev@programming.devJ This user is from outside of this forum
                          [email protected]
                          wrote last edited by
                          #86

                          It's funby that older captchas could be viewed as proof of work algorithms now because image recognition is so good. (From using captchas.)

                          mubelotix@jlai.luM 1 Reply Last reply
                          4
                          • fattyfoods@feddit.nlF [email protected]
                            This post did not contain any content.
                            T This user is from outside of this forum
                            T This user is from outside of this forum
                            [email protected]
                            wrote last edited by
                            #87

                            I know people love anime myself included, but this popping up on my work PC can be frustrating

                            I 1 Reply Last reply
                            14
                            • T [email protected]

                              I know people love anime myself included, but this popping up on my work PC can be frustrating

                              I This user is from outside of this forum
                              I This user is from outside of this forum
                              [email protected]
                              wrote last edited by
                              #88

                              Contact the administrator to ask them to change the landing page

                              1 Reply Last reply
                              7
                              • fattyfoods@feddit.nlF [email protected]
                                This post did not contain any content.
                                I This user is from outside of this forum
                                I This user is from outside of this forum
                                [email protected]
                                wrote last edited by
                                #89

                                Fantastic article! Makes me less afraid to host a website with this potential solution

                                1 Reply Last reply
                                2
                                • I [email protected]

                                  All this could be avoided by making submit photo id to login into a account.

                                  H This user is from outside of this forum
                                  H This user is from outside of this forum
                                  [email protected]
                                  wrote last edited by
                                  #90

                                  I don't think this would help:

                                  https://thispersondoesnotexist.com/

                                  I 1 Reply Last reply
                                  5
                                  • repletelocum@lemmy.blahaj.zoneR [email protected]

                                    So they make the internet worse for poor people? I could get through 20k in a second, but someone with just an old laptop would take a few minutes, no?

                                    isolatedscotch@discuss.tchncs.deI This user is from outside of this forum
                                    isolatedscotch@discuss.tchncs.deI This user is from outside of this forum
                                    [email protected]
                                    wrote last edited by
                                    #91

                                    So they make the internet worse for poor people? I could get through 20k in a second, but someone with just an old laptop would take a few minutes, no?

                                    i mean, kinda?
                                    you are absolutely right that someone with an old pc might need to wait a few extra seconds, but the speed is ultimately throttled by the browser

                                    1 Reply Last reply
                                    9
                                    • K [email protected]

                                      "You criticize society yet you participate in it. Curious."

                                      vendetta9076@sh.itjust.worksV This user is from outside of this forum
                                      vendetta9076@sh.itjust.worksV This user is from outside of this forum
                                      [email protected]
                                      wrote last edited by
                                      #92

                                      If you think thats comparable then you're dumber than I thought.

                                      1 Reply Last reply
                                      0
                                      • H [email protected]

                                        I don't think this would help:

                                        https://thispersondoesnotexist.com/

                                        I This user is from outside of this forum
                                        I This user is from outside of this forum
                                        [email protected]
                                        wrote last edited by
                                        #93

                                        By photo ID, I don't mean just any photo, I mean "photo id" cryptographically signed by the state, certificates checked, database pinged, identity validated, the whole enchilada

                                        russjr08@bitforged.spaceR 1 Reply Last reply
                                        0
                                        • D [email protected]

                                          "Yes", for any bits the user sees. The frontend UI can be behind Anubis without issues. The API, including both user and federation, cannot. We expect "bots" to use an API, so you can't put human verification in front of it. These "bots* also include applications that aren't aware of Anubis, or unable to pass it, like all third party Lemmy apps.

                                          That does stop almost all generic AI scraping, though it does not prevent targeted abuse.

                                          B This user is from outside of this forum
                                          B This user is from outside of this forum
                                          [email protected]
                                          wrote last edited by
                                          #94

                                          The API, including both user and federation, cannot.

                                          This is theoretically an issue however in practice Anubis only weighs requests that appear to come from a browser: https://anubis.techaro.lol/docs/design/how-anubis-works

                                          I just tested my instance with Jerboa and it seems to work just fine.

                                          1 Reply Last reply
                                          1
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups