Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

agnos.is Forums

  1. Home
  2. Open Source
  3. The Open-Source Software Saving the Internet From AI Bot Scrapers

The Open-Source Software Saving the Internet From AI Bot Scrapers

Scheduled Pinned Locked Moved Open Source
opensource
102 Posts 65 Posters 1 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • F [email protected]

    Probably because the creator had a blog post that got shared around at a point in time where this exact problem was resonating with users.

    It's not always about being first but about marketing.

    johnedwa@sopuli.xyzJ This user is from outside of this forum
    johnedwa@sopuli.xyzJ This user is from outside of this forum
    [email protected]
    wrote last edited by [email protected]
    #68

    It’s not always about being first but about marketing.

    And one has a cute catgirl mascot, the other a website that looks like a blockchain techbro startup.
    I'm even willing to bet the amount of people that set up Anubis just to get the cute splash screen isn't insignificant.

    jackbydev@programming.devJ 1 Reply Last reply
    22
    • lime@feddit.nuL [email protected]

      anubis is basically a bitcoin miner, with the difficulty turned way down (and obviously not resulting in any coins), so it's inherently random. if it takes minutes it does seem like something is wrong though. maybe a network error?

      isolatedscotch@discuss.tchncs.deI This user is from outside of this forum
      isolatedscotch@discuss.tchncs.deI This user is from outside of this forum
      [email protected]
      wrote last edited by [email protected]
      #69

      adding to this, some sites set the difficulty way higher then others, nerdvpn's invidious and redlib instances take about 5 seconds and some ~20k hashes, while privacyredirect's inatances are almost instant with less then 50 hashes each time

      repletelocum@lemmy.blahaj.zoneR 1 Reply Last reply
      15
      • mubelotix@jlai.luM [email protected]

        Exactly. It's called proof-of-work and was originally invented to reduce spam emails but was later used by Bitcoin to control its growth speed

        R This user is from outside of this forum
        R This user is from outside of this forum
        [email protected]
        wrote last edited by
        #70

        it wasn't made for bitcoin originally? didn't know that!

        0 1 Reply Last reply
        2
        • I [email protected]

          Yes, it would make lemmy as unsearchable as discord. Instead of unsearchable as pinterest.

          bdonvr@thelemmy.clubB This user is from outside of this forum
          bdonvr@thelemmy.clubB This user is from outside of this forum
          [email protected]
          wrote last edited by
          #71

          That's not true, search indexer bots should be allowed through from what I read here.

          I 1 Reply Last reply
          1
          • bdonvr@thelemmy.clubB [email protected]

            Ooh can this work with Lemmy without affecting federation?

            R This user is from outside of this forum
            R This user is from outside of this forum
            [email protected]
            wrote last edited by
            #72

            To be honest, I need to ask my admin about that!

            fxomt@lemmy.dbzer0.comF 1 Reply Last reply
            2
            • I [email protected]

              All this could be avoided by making submit photo id to login into a account.

              anzo@programming.devA This user is from outside of this forum
              anzo@programming.devA This user is from outside of this forum
              [email protected]
              wrote last edited by
              #73

              That's awful, it means I would get my photo id stolen hundreds of times per day, or there's also thisfacedoesntexists... and won't work. For many reasons. Not all websites require an account. And even those that do, when they ask for "personal verification" (like dating apps) have a hard time to implement just that. Most "serious" cases use human review of the photo and a video that has your face and you move in and out of an oval shape...

              I 1 Reply Last reply
              12
              • anzo@programming.devA [email protected]

                That's awful, it means I would get my photo id stolen hundreds of times per day, or there's also thisfacedoesntexists... and won't work. For many reasons. Not all websites require an account. And even those that do, when they ask for "personal verification" (like dating apps) have a hard time to implement just that. Most "serious" cases use human review of the photo and a video that has your face and you move in and out of an oval shape...

                I This user is from outside of this forum
                I This user is from outside of this forum
                [email protected]
                wrote last edited by
                #74

                Also you must drink a verification can !

                1 Reply Last reply
                4
                • bdonvr@thelemmy.clubB [email protected]

                  That's not true, search indexer bots should be allowed through from what I read here.

                  I This user is from outside of this forum
                  I This user is from outside of this forum
                  [email protected]
                  wrote last edited by
                  #75

                  If you allow my searchxng search scraper then an AI scraper is indistinguishable.

                  If you mean, "google and duckduckgo are whitelisted" then lemmy will only be searchable there, those specific whitelisted hosts. And google search index is also an AI scraper bot.

                  1 Reply Last reply
                  4
                  • D [email protected]

                    No, it'd still be a problem; every diff between commits is expensive to render to web, even if "only one company" is scraping it, "only one time". Many of these applications are designed for humans, not scrapers.

                    I This user is from outside of this forum
                    I This user is from outside of this forum
                    [email protected]
                    wrote last edited by
                    #76

                    If the rendering data for scraper was really the problem
                    Then the solution is simple, just have downloadable dumps of the publicly available information
                    That would be extremely efficient and cost fractions of pennies in monthly bandwidth
                    Plus the data would be far more usable for whatever they are using it for.

                    The problem is trying to have freely available data, but for the host to maintain the ability to leverage this data later.

                    I don't think we can have both of these.

                    1 Reply Last reply
                    0
                    • R [email protected]

                      it wasn't made for bitcoin originally? didn't know that!

                      0 This user is from outside of this forum
                      0 This user is from outside of this forum
                      [email protected]
                      wrote last edited by
                      #77

                      Originally called hashcash: http://hashcash.org/

                      R 1 Reply Last reply
                      10
                      • 0 [email protected]

                        Originally called hashcash: http://hashcash.org/

                        R This user is from outside of this forum
                        R This user is from outside of this forum
                        [email protected]
                        wrote last edited by
                        #78

                        you know it's old when it doesn't have ssl

                        1 Reply Last reply
                        8
                        • phase@lemmy.8th.worldP [email protected]

                          Support, pay, and get it 🙂

                          F This user is from outside of this forum
                          F This user is from outside of this forum
                          [email protected]
                          wrote last edited by
                          #79

                          Ah so it is possible to change it

                          1 Reply Last reply
                          1
                          • K [email protected]

                            "You criticize society yet you participate in it. Curious."

                            B This user is from outside of this forum
                            B This user is from outside of this forum
                            [email protected]
                            wrote last edited by [email protected]
                            #80

                            You can't freely download and edit society. You can download and edit this piece of software here, because this is FOSS. You could download it now and change it, or improve it however you'd like. But, you can't, because you're just pretending to be concerned about issues that are made up. Or, if being generous from what I can read here, only you have encountered.

                            1 Reply Last reply
                            2
                            • U [email protected]

                              Non paywalled link https://archive.is/VcoE1

                              It basically boils down to making the browser do some cpu heavy calculations before allowing access. This is no problem for a single user, but for a bot farm this would increase the amount of compute power they need 100x or more.

                              exu@feditown.comE This user is from outside of this forum
                              exu@feditown.comE This user is from outside of this forum
                              [email protected]
                              wrote last edited by
                              #81

                              It inherently blocks a lot of the simpler bots by requiring JavaScript as well.

                              1 Reply Last reply
                              8
                              • isolatedscotch@discuss.tchncs.deI [email protected]

                                adding to this, some sites set the difficulty way higher then others, nerdvpn's invidious and redlib instances take about 5 seconds and some ~20k hashes, while privacyredirect's inatances are almost instant with less then 50 hashes each time

                                repletelocum@lemmy.blahaj.zoneR This user is from outside of this forum
                                repletelocum@lemmy.blahaj.zoneR This user is from outside of this forum
                                [email protected]
                                wrote last edited by
                                #82

                                So they make the internet worse for poor people? I could get through 20k in a second, but someone with just an old laptop would take a few minutes, no?

                                jackbydev@programming.devJ isolatedscotch@discuss.tchncs.deI C M 4 Replies Last reply
                                6
                                • R [email protected]

                                  To be honest, I need to ask my admin about that!

                                  fxomt@lemmy.dbzer0.comF This user is from outside of this forum
                                  fxomt@lemmy.dbzer0.comF This user is from outside of this forum
                                  [email protected]
                                  wrote last edited by [email protected]
                                  #83

                                  We don't use anubis but we use iocaine (?), see /0 for the announcement post

                                  1 Reply Last reply
                                  2
                                  • repletelocum@lemmy.blahaj.zoneR [email protected]

                                    So they make the internet worse for poor people? I could get through 20k in a second, but someone with just an old laptop would take a few minutes, no?

                                    jackbydev@programming.devJ This user is from outside of this forum
                                    jackbydev@programming.devJ This user is from outside of this forum
                                    [email protected]
                                    wrote last edited by
                                    #84

                                    Well, it's the scrapers that are causing the problem.

                                    1 Reply Last reply
                                    6
                                    • johnedwa@sopuli.xyzJ [email protected]

                                      It’s not always about being first but about marketing.

                                      And one has a cute catgirl mascot, the other a website that looks like a blockchain techbro startup.
                                      I'm even willing to bet the amount of people that set up Anubis just to get the cute splash screen isn't insignificant.

                                      jackbydev@programming.devJ This user is from outside of this forum
                                      jackbydev@programming.devJ This user is from outside of this forum
                                      [email protected]
                                      wrote last edited by
                                      #85

                                      Compare and contrast.

                                      High-performance traffic management and next-gen security with multi-cloud management and observability. Built for the enterprise — open source at heart.

                                      Sounds like some over priced, vacuous, do-everything solution. Looks and sounds like every other tech website. Looks like it is meant to appeal to the people who still say "cyber". Looks and sounds like fauxpen source.

                                      Weigh the soul of incoming HTTP requests to protect your website!

                                      Cute. Adorable. Baby girl. Protect my website. Looks fun. Has one clear goal.

                                      1 Reply Last reply
                                      12
                                      • mubelotix@jlai.luM [email protected]

                                        Exactly. It's called proof-of-work and was originally invented to reduce spam emails but was later used by Bitcoin to control its growth speed

                                        jackbydev@programming.devJ This user is from outside of this forum
                                        jackbydev@programming.devJ This user is from outside of this forum
                                        [email protected]
                                        wrote last edited by
                                        #86

                                        It's funby that older captchas could be viewed as proof of work algorithms now because image recognition is so good. (From using captchas.)

                                        mubelotix@jlai.luM 1 Reply Last reply
                                        4
                                        • fattyfoods@feddit.nlF [email protected]
                                          This post did not contain any content.
                                          T This user is from outside of this forum
                                          T This user is from outside of this forum
                                          [email protected]
                                          wrote last edited by
                                          #87

                                          I know people love anime myself included, but this popping up on my work PC can be frustrating

                                          I 1 Reply Last reply
                                          14
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups