Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

agnos.is Forums

  1. Home
  2. Open Source
  3. The Open-Source Software Saving the Internet From AI Bot Scrapers

The Open-Source Software Saving the Internet From AI Bot Scrapers

Scheduled Pinned Locked Moved Open Source
opensource
102 Posts 65 Posters 1 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • D [email protected]

    No, it'd still be a problem; every diff between commits is expensive to render to web, even if "only one company" is scraping it, "only one time". Many of these applications are designed for humans, not scrapers.

    I This user is from outside of this forum
    I This user is from outside of this forum
    [email protected]
    wrote last edited by
    #76

    If the rendering data for scraper was really the problem
    Then the solution is simple, just have downloadable dumps of the publicly available information
    That would be extremely efficient and cost fractions of pennies in monthly bandwidth
    Plus the data would be far more usable for whatever they are using it for.

    The problem is trying to have freely available data, but for the host to maintain the ability to leverage this data later.

    I don't think we can have both of these.

    1 Reply Last reply
    0
    • R [email protected]

      it wasn't made for bitcoin originally? didn't know that!

      0 This user is from outside of this forum
      0 This user is from outside of this forum
      [email protected]
      wrote last edited by
      #77

      Originally called hashcash: http://hashcash.org/

      R 1 Reply Last reply
      10
      • 0 [email protected]

        Originally called hashcash: http://hashcash.org/

        R This user is from outside of this forum
        R This user is from outside of this forum
        [email protected]
        wrote last edited by
        #78

        you know it's old when it doesn't have ssl

        1 Reply Last reply
        8
        • phase@lemmy.8th.worldP [email protected]

          Support, pay, and get it πŸ™‚

          F This user is from outside of this forum
          F This user is from outside of this forum
          [email protected]
          wrote last edited by
          #79

          Ah so it is possible to change it

          1 Reply Last reply
          1
          • K [email protected]

            "You criticize society yet you participate in it. Curious."

            B This user is from outside of this forum
            B This user is from outside of this forum
            [email protected]
            wrote last edited by [email protected]
            #80

            You can't freely download and edit society. You can download and edit this piece of software here, because this is FOSS. You could download it now and change it, or improve it however you'd like. But, you can't, because you're just pretending to be concerned about issues that are made up. Or, if being generous from what I can read here, only you have encountered.

            1 Reply Last reply
            2
            • U [email protected]

              Non paywalled link https://archive.is/VcoE1

              It basically boils down to making the browser do some cpu heavy calculations before allowing access. This is no problem for a single user, but for a bot farm this would increase the amount of compute power they need 100x or more.

              exu@feditown.comE This user is from outside of this forum
              exu@feditown.comE This user is from outside of this forum
              [email protected]
              wrote last edited by
              #81

              It inherently blocks a lot of the simpler bots by requiring JavaScript as well.

              1 Reply Last reply
              8
              • isolatedscotch@discuss.tchncs.deI [email protected]

                adding to this, some sites set the difficulty way higher then others, nerdvpn's invidious and redlib instances take about 5 seconds and some ~20k hashes, while privacyredirect's inatances are almost instant with less then 50 hashes each time

                repletelocum@lemmy.blahaj.zoneR This user is from outside of this forum
                repletelocum@lemmy.blahaj.zoneR This user is from outside of this forum
                [email protected]
                wrote last edited by
                #82

                So they make the internet worse for poor people? I could get through 20k in a second, but someone with just an old laptop would take a few minutes, no?

                jackbydev@programming.devJ isolatedscotch@discuss.tchncs.deI C M 4 Replies Last reply
                6
                • R [email protected]

                  To be honest, I need to ask my admin about that!

                  fxomt@lemmy.dbzer0.comF This user is from outside of this forum
                  fxomt@lemmy.dbzer0.comF This user is from outside of this forum
                  [email protected]
                  wrote last edited by [email protected]
                  #83

                  We don't use anubis but we use iocaine (?), see /0 for the announcement post

                  1 Reply Last reply
                  2
                  • repletelocum@lemmy.blahaj.zoneR [email protected]

                    So they make the internet worse for poor people? I could get through 20k in a second, but someone with just an old laptop would take a few minutes, no?

                    jackbydev@programming.devJ This user is from outside of this forum
                    jackbydev@programming.devJ This user is from outside of this forum
                    [email protected]
                    wrote last edited by
                    #84

                    Well, it's the scrapers that are causing the problem.

                    1 Reply Last reply
                    6
                    • johnedwa@sopuli.xyzJ [email protected]

                      It’s not always about being first but about marketing.

                      And one has a cute catgirl mascot, the other a website that looks like a blockchain techbro startup.
                      I'm even willing to bet the amount of people that set up Anubis just to get the cute splash screen isn't insignificant.

                      jackbydev@programming.devJ This user is from outside of this forum
                      jackbydev@programming.devJ This user is from outside of this forum
                      [email protected]
                      wrote last edited by
                      #85

                      Compare and contrast.

                      High-performance traffic management and next-gen security with multi-cloud management and observability. Built for the enterprise β€” open source at heart.

                      Sounds like some over priced, vacuous, do-everything solution. Looks and sounds like every other tech website. Looks like it is meant to appeal to the people who still say "cyber". Looks and sounds like fauxpen source.

                      Weigh the soul of incoming HTTP requests to protect your website!

                      Cute. Adorable. Baby girl. Protect my website. Looks fun. Has one clear goal.

                      1 Reply Last reply
                      12
                      • mubelotix@jlai.luM [email protected]

                        Exactly. It's called proof-of-work and was originally invented to reduce spam emails but was later used by Bitcoin to control its growth speed

                        jackbydev@programming.devJ This user is from outside of this forum
                        jackbydev@programming.devJ This user is from outside of this forum
                        [email protected]
                        wrote last edited by
                        #86

                        It's funby that older captchas could be viewed as proof of work algorithms now because image recognition is so good. (From using captchas.)

                        mubelotix@jlai.luM 1 Reply Last reply
                        4
                        • fattyfoods@feddit.nlF [email protected]
                          This post did not contain any content.
                          T This user is from outside of this forum
                          T This user is from outside of this forum
                          [email protected]
                          wrote last edited by
                          #87

                          I know people love anime myself included, but this popping up on my work PC can be frustrating

                          I 1 Reply Last reply
                          14
                          • T [email protected]

                            I know people love anime myself included, but this popping up on my work PC can be frustrating

                            I This user is from outside of this forum
                            I This user is from outside of this forum
                            [email protected]
                            wrote last edited by
                            #88

                            Contact the administrator to ask them to change the landing page

                            1 Reply Last reply
                            7
                            • fattyfoods@feddit.nlF [email protected]
                              This post did not contain any content.
                              I This user is from outside of this forum
                              I This user is from outside of this forum
                              [email protected]
                              wrote last edited by
                              #89

                              Fantastic article! Makes me less afraid to host a website with this potential solution

                              1 Reply Last reply
                              2
                              • I [email protected]

                                All this could be avoided by making submit photo id to login into a account.

                                H This user is from outside of this forum
                                H This user is from outside of this forum
                                [email protected]
                                wrote last edited by
                                #90

                                I don't think this would help:

                                https://thispersondoesnotexist.com/

                                I 1 Reply Last reply
                                5
                                • repletelocum@lemmy.blahaj.zoneR [email protected]

                                  So they make the internet worse for poor people? I could get through 20k in a second, but someone with just an old laptop would take a few minutes, no?

                                  isolatedscotch@discuss.tchncs.deI This user is from outside of this forum
                                  isolatedscotch@discuss.tchncs.deI This user is from outside of this forum
                                  [email protected]
                                  wrote last edited by
                                  #91

                                  So they make the internet worse for poor people? I could get through 20k in a second, but someone with just an old laptop would take a few minutes, no?

                                  i mean, kinda?
                                  you are absolutely right that someone with an old pc might need to wait a few extra seconds, but the speed is ultimately throttled by the browser

                                  1 Reply Last reply
                                  9
                                  • K [email protected]

                                    "You criticize society yet you participate in it. Curious."

                                    vendetta9076@sh.itjust.worksV This user is from outside of this forum
                                    vendetta9076@sh.itjust.worksV This user is from outside of this forum
                                    [email protected]
                                    wrote last edited by
                                    #92

                                    If you think thats comparable then you're dumber than I thought.

                                    1 Reply Last reply
                                    0
                                    • H [email protected]

                                      I don't think this would help:

                                      https://thispersondoesnotexist.com/

                                      I This user is from outside of this forum
                                      I This user is from outside of this forum
                                      [email protected]
                                      wrote last edited by
                                      #93

                                      By photo ID, I don't mean just any photo, I mean "photo id" cryptographically signed by the state, certificates checked, database pinged, identity validated, the whole enchilada

                                      russjr08@bitforged.spaceR 1 Reply Last reply
                                      0
                                      • D [email protected]

                                        "Yes", for any bits the user sees. The frontend UI can be behind Anubis without issues. The API, including both user and federation, cannot. We expect "bots" to use an API, so you can't put human verification in front of it. These "bots* also include applications that aren't aware of Anubis, or unable to pass it, like all third party Lemmy apps.

                                        That does stop almost all generic AI scraping, though it does not prevent targeted abuse.

                                        B This user is from outside of this forum
                                        B This user is from outside of this forum
                                        [email protected]
                                        wrote last edited by
                                        #94

                                        The API, including both user and federation, cannot.

                                        This is theoretically an issue however in practice Anubis only weighs requests that appear to come from a browser: https://anubis.techaro.lol/docs/design/how-anubis-works

                                        I just tested my instance with Jerboa and it seems to work just fine.

                                        1 Reply Last reply
                                        1
                                        • jackbydev@programming.devJ [email protected]

                                          It's funby that older captchas could be viewed as proof of work algorithms now because image recognition is so good. (From using captchas.)

                                          mubelotix@jlai.luM This user is from outside of this forum
                                          mubelotix@jlai.luM This user is from outside of this forum
                                          [email protected]
                                          wrote last edited by [email protected]
                                          #95

                                          Interesting stance. I have bought many tens of thousand of captcha soves for legitimate reasons, and I have now completely lost faith in them

                                          1 Reply Last reply
                                          1
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups