Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

agnos.is Forums

  1. Home
  2. Open Source
  3. The Open-Source Software Saving the Internet From AI Bot Scrapers

The Open-Source Software Saving the Internet From AI Bot Scrapers

Scheduled Pinned Locked Moved Open Source
opensource
102 Posts 65 Posters 1 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • fattyfoods@feddit.nlF [email protected]
    This post did not contain any content.
    M This user is from outside of this forum
    M This user is from outside of this forum
    [email protected]
    wrote last edited by
    #10

    <Stupidquestion>

    What advantage does this software provide over simply banning bots via robots.txt?

    </Stupidquestion>

    P K M M O 7 Replies Last reply
    20
    • K [email protected]

      I get that website admins are desperate for a solution, but Anubis is fundamentally flawed.

      It is hostile to the user, because it is very slow on older hardware andere forces you to use javascript.

      It is bad for the environment, because it wastes energy on useless computations similar to mining crypto. If more websites start using this, that really adds up.

      But most importantly, it won't work in the end. These scraping tech companies have much deeper pockets and can use specialized hardware that is much more efficient at solving these challenges than a normal web browser.

      lukecooperatus@lemmy.mlL This user is from outside of this forum
      lukecooperatus@lemmy.mlL This user is from outside of this forum
      [email protected]
      wrote last edited by
      #11

      she’s working on a non cryptographic challenge so it taxes users’ CPUs less, and also thinking about a version that doesn’t require JavaScript

      Sounds like the developer of Anubis is aware and working on these shortcomings.

      Still, IMO these are minor short term issues compared to the scope of the AI problem it's addressing.

      K 1 Reply Last reply
      25
      • K [email protected]

        On the contrary, I'm hoping for a solution that is better than this.

        Do you disagree with any part of my assessment? How do you think Anubis will work long term?

        D This user is from outside of this forum
        D This user is from outside of this forum
        [email protected]
        wrote last edited by
        #12

        Anubis long term actually costs them millions and billions more in energy to run browser and more code. Either way they have to add shit to the bots which costs all the companies money.

        1 Reply Last reply
        2
        • M [email protected]

          <Stupidquestion>

          What advantage does this software provide over simply banning bots via robots.txt?

          </Stupidquestion>

          P This user is from outside of this forum
          P This user is from outside of this forum
          [email protected]
          wrote last edited by
          #13

          the scrapers ignore robots.txt. It doesn't really ban them - it just asks them not to access things, but they are programmed by assholes.

          1 Reply Last reply
          26
          • F [email protected]

            I’d like to use Anubis but the strange hentai character as a mascot is not too professional

            umbrella@lemmy.mlU This user is from outside of this forum
            umbrella@lemmy.mlU This user is from outside of this forum
            [email protected]
            wrote last edited by
            #14

            i'm sure you could replace it if you really wanted to

            1 Reply Last reply
            6
            • M [email protected]

              <Stupidquestion>

              What advantage does this software provide over simply banning bots via robots.txt?

              </Stupidquestion>

              K This user is from outside of this forum
              K This user is from outside of this forum
              [email protected]
              wrote last edited by
              #15

              Robots.txt expects that the client is respecting the rules, for instance, marking that they are a scraper.

              AI scrapers don't respect this trust, and thus robots.txt is meaningless.

              1 Reply Last reply
              75
              • M [email protected]

                <Stupidquestion>

                What advantage does this software provide over simply banning bots via robots.txt?

                </Stupidquestion>

                M This user is from outside of this forum
                M This user is from outside of this forum
                [email protected]
                wrote last edited by
                #16

                The problem is Ai doesn't follow robots.txt,so Cloudflare are Anubis developed a solution.

                1 Reply Last reply
                6
                • K [email protected]

                  I get that website admins are desperate for a solution, but Anubis is fundamentally flawed.

                  It is hostile to the user, because it is very slow on older hardware andere forces you to use javascript.

                  It is bad for the environment, because it wastes energy on useless computations similar to mining crypto. If more websites start using this, that really adds up.

                  But most importantly, it won't work in the end. These scraping tech companies have much deeper pockets and can use specialized hardware that is much more efficient at solving these challenges than a normal web browser.

                  B This user is from outside of this forum
                  B This user is from outside of this forum
                  [email protected]
                  wrote last edited by [email protected]
                  #17

                  It is basically instantaneous on my 12 year old Keppler GPU Linux Box. It is substantially less impactful on the environment than AI tar pits and other deterrents. The Cryptography happening is something almost all browsers from the last 10 years can do natively that Scrapers have to be individually programmed to do. Making it several orders of magnitude beyond impractical for every single corporate bot to be repurposed for. Only to then be rendered moot, because it's an open-source project that someone will just update the cryptographic algorithm for. These posts contain links to articles, if you read them you might answer some of your own questions and have more to contribute to the conversation.

                  K 1 Reply Last reply
                  7
                  • fattyfoods@feddit.nlF [email protected]
                    This post did not contain any content.
                    M This user is from outside of this forum
                    M This user is from outside of this forum
                    [email protected]
                    wrote last edited by
                    #18

                    Well, now that y'all put it that way, I think it was pretty naive from me to think that these companies, whose business model is basically theft, would honour a lousy robots.txt file...

                    1 Reply Last reply
                    1
                    • F [email protected]

                      I’d like to use Anubis but the strange hentai character as a mascot is not too professional

                      S This user is from outside of this forum
                      S This user is from outside of this forum
                      [email protected]
                      wrote last edited by [email protected]
                      #19

                      I actually really like the developer's rationale for why they use an anime character as the mascot.

                      The whole blog post is worth reading, but the TL;DR is this:

                      Of course, nothing is stopping you from forking the software to replace the art assets. Instead of doing that, I would rather you support the project and purchase a license for the commercial variant of Anubis named BotStopper. Doing this will make sure that the project is sustainable and that I don't burn myself out to a crisp in the process of keeping small internet websites open to the public.

                      At some level, I use the presence of the Anubis mascot as a "shopping cart test". If you either pay me for the unbranded version or leave the character intact, I'm going to take any bug reports more seriously. It's a positive sign that you are willing to invest in the project's success and help make sure that people developing vital infrastructure are not neglected.

                      C 1 Reply Last reply
                      46
                      • M [email protected]

                        <Stupidquestion>

                        What advantage does this software provide over simply banning bots via robots.txt?

                        </Stupidquestion>

                        M This user is from outside of this forum
                        M This user is from outside of this forum
                        [email protected]
                        wrote last edited by
                        #20

                        Well, now that y'all put it that way, I think it was pretty naive from me to think that these companies, whose business model is basically theft, would honour a lousy robots.txt file...

                        1 Reply Last reply
                        40
                        • S [email protected]

                          I actually really like the developer's rationale for why they use an anime character as the mascot.

                          The whole blog post is worth reading, but the TL;DR is this:

                          Of course, nothing is stopping you from forking the software to replace the art assets. Instead of doing that, I would rather you support the project and purchase a license for the commercial variant of Anubis named BotStopper. Doing this will make sure that the project is sustainable and that I don't burn myself out to a crisp in the process of keeping small internet websites open to the public.

                          At some level, I use the presence of the Anubis mascot as a "shopping cart test". If you either pay me for the unbranded version or leave the character intact, I'm going to take any bug reports more seriously. It's a positive sign that you are willing to invest in the project's success and help make sure that people developing vital infrastructure are not neglected.

                          C This user is from outside of this forum
                          C This user is from outside of this forum
                          [email protected]
                          wrote last edited by
                          #21

                          This is a great compromise honestly. More OSS devs need to be paid for their work and if an anime character helps do that, I'm all for it.

                          1 Reply Last reply
                          17
                          • fattyfoods@feddit.nlF [email protected]
                            This post did not contain any content.
                            bdonvr@thelemmy.clubB This user is from outside of this forum
                            bdonvr@thelemmy.clubB This user is from outside of this forum
                            [email protected]
                            wrote last edited by
                            #22

                            Ooh can this work with Lemmy without affecting federation?

                            I S B D I 6 Replies Last reply
                            22
                            • M [email protected]

                              <Stupidquestion>

                              What advantage does this software provide over simply banning bots via robots.txt?

                              </Stupidquestion>

                              O This user is from outside of this forum
                              O This user is from outside of this forum
                              [email protected]
                              wrote last edited by
                              #23

                              I mean, you could have read the article before asking, it's literally in there...

                              1 Reply Last reply
                              1
                              • bdonvr@thelemmy.clubB [email protected]

                                Ooh can this work with Lemmy without affecting federation?

                                I This user is from outside of this forum
                                I This user is from outside of this forum
                                [email protected]
                                wrote last edited by
                                #24

                                Yeah, it's already deployed on slrpnk.net. I see it momentarily every time I load the site.

                                1 Reply Last reply
                                3
                                • K [email protected]

                                  I get that website admins are desperate for a solution, but Anubis is fundamentally flawed.

                                  It is hostile to the user, because it is very slow on older hardware andere forces you to use javascript.

                                  It is bad for the environment, because it wastes energy on useless computations similar to mining crypto. If more websites start using this, that really adds up.

                                  But most importantly, it won't work in the end. These scraping tech companies have much deeper pockets and can use specialized hardware that is much more efficient at solving these challenges than a normal web browser.

                                  D This user is from outside of this forum
                                  D This user is from outside of this forum
                                  [email protected]
                                  wrote last edited by
                                  #25

                                  It takes like half a second on my Fairphone 3, and the CPU in this thing is absolute dogshit. I also doubt that the power consumption is particularly significant compared to the overhead of parsing, executing and JIT-compiling the 14MiB of JavaScript frameworks on the actual website.

                                  K 1 Reply Last reply
                                  8
                                  • fattyfoods@feddit.nlF [email protected]
                                    This post did not contain any content.
                                    K This user is from outside of this forum
                                    K This user is from outside of this forum
                                    [email protected]
                                    wrote last edited by
                                    #26

                                    Just recently there was a guy on the NANOG List ranting about Anubis being the wrong approach and people should just cache properly then their servers would handle thousands of users and the bots wouldn't matter. Anyone who puts git online has no-one to blame but themselves, e-commerce should just be made cacheable etc. Seemed a bit idealistic, a bit detached from the current reality.

                                    Ah found it, here

                                    D 1 Reply Last reply
                                    8
                                    • bdonvr@thelemmy.clubB [email protected]

                                      Ooh can this work with Lemmy without affecting federation?

                                      S This user is from outside of this forum
                                      S This user is from outside of this forum
                                      [email protected]
                                      wrote last edited by
                                      #27

                                      As long as its not configured improperly. When forgejo devs added it it broke downloading images with Kubernetes for a moment. Basically would need to make sure user agent header for federation is allowed.

                                      1 Reply Last reply
                                      2
                                      • K [email protected]

                                        I get that website admins are desperate for a solution, but Anubis is fundamentally flawed.

                                        It is hostile to the user, because it is very slow on older hardware andere forces you to use javascript.

                                        It is bad for the environment, because it wastes energy on useless computations similar to mining crypto. If more websites start using this, that really adds up.

                                        But most importantly, it won't work in the end. These scraping tech companies have much deeper pockets and can use specialized hardware that is much more efficient at solving these challenges than a normal web browser.

                                        S This user is from outside of this forum
                                        S This user is from outside of this forum
                                        [email protected]
                                        wrote last edited by
                                        #28

                                        A javascriptless check was released recently I just read about it. Uses some refresh HTML tag and a delay. Its not default though since its new.

                                        phase@lemmy.8th.worldP 1 Reply Last reply
                                        1
                                        • F [email protected]

                                          I’d like to use Anubis but the strange hentai character as a mascot is not too professional

                                          chickenandrice@sh.itjust.worksC This user is from outside of this forum
                                          chickenandrice@sh.itjust.worksC This user is from outside of this forum
                                          [email protected]
                                          wrote last edited by
                                          #29

                                          Oh no why can't the web be even more boring and professional

                                          F 1 Reply Last reply
                                          10
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups