Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

agnos.is Forums

  1. Home
  2. Ask Lemmy
  3. Is there a way to reduce the number of AI generated websites that appear in search results?

Is there a way to reduce the number of AI generated websites that appear in search results?

Scheduled Pinned Locked Moved Ask Lemmy
asklemmy
41 Posts 31 Posters 3 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • L [email protected]

    If it were websites made with AI, why wouldn't Kagi find them just the same? Search engines just search key terms. Can't see how it would know if the term was typed by a person or a bot. That said I used SearchXNG and it wasn't bad.

    I This user is from outside of this forum
    I This user is from outside of this forum
    [email protected]
    wrote last edited by
    #12

    Ah, I guess I misunderstood the problem

    1 Reply Last reply
    3
    • C [email protected]

      I keep trying to find things like “making waffles from sour dough discard” and all the sites are the same: long meandering paragraphs full of links to other things on the site with dubious instructions.

      Considering at this point I can pretty much identify the type of site by looking at it; are there good extensions or search engines which might remove them from search results?

      C This user is from outside of this forum
      C This user is from outside of this forum
      [email protected]
      wrote last edited by
      #13

      I've almost given up on just searching the whole internet for something. I either filter by eye for sites I trust in the results, or add a filter to the query. There are usually a handful of sites I trust on a given topic.

      1 Reply Last reply
      1
      • C [email protected]

        Update: for now it seems duck duck go’s date range filter is kinda a magic bullet for this type of thing. Set the range between 2010 and 2020 and the top results for a lot of temporally agnostic searches.

        C This user is from outside of this forum
        C This user is from outside of this forum
        [email protected]
        wrote last edited by
        #14

        I've switched to presearch.com long ago. No more tracking.

        B 1 Reply Last reply
        6
        • C [email protected]

          I keep trying to find things like “making waffles from sour dough discard” and all the sites are the same: long meandering paragraphs full of links to other things on the site with dubious instructions.

          Considering at this point I can pretty much identify the type of site by looking at it; are there good extensions or search engines which might remove them from search results?

          tal@lemmy.todayT This user is from outside of this forum
          tal@lemmy.todayT This user is from outside of this forum
          [email protected]
          wrote last edited by [email protected]
          #15

          No, because there's no reliable way to distinguish AI-generated spam sites from non-AI-generated spam sites. I'll also add that I don't expect there to be one promptly forthcoming: any attempt to identify them is going to run into improved systems, and that's gonna happen even if the systems aren't explicitly intending to evade detection. If it were easy, Google would have done so years back. I can recognize some now, but the SEO spam crowd that's creating this is trying hard to pollute search engine results, and if someone implements a generalized "block" that's effective, they're going to keep looking for alternatives until they find something that gets through.

          On Kagi, I can set the acceptable date range on results to prior to the emergence of LLMs, but that cuts out a lot of material that I want to see. For some searches, that might work, but it's not really a general solution.

          You can manually blacklist or deprioritize sites on Kagi. Probably can either run some sort of local proxy or Greasemonkey-style plugin that would let you do so in browser on any search engine. Problem is that there are people making these sites faster than you're going to be banning them.

          Kagi's also got a "pin" and a "raised priority" feature for a list of sites, and I suppose could whitelist some "known good" sites. Kagi's "blacklist/deprioritize/prioritize/pin" feature does not have the ability to exchange sites between users (and I imagine that there'd be some privacy issues with doing so) aside from Kagi running a "leaderboard" of the most-blacklisted/deprioritized/prioritized/pinned sites. One could probably do the "proxy" or "plugin" route as well for a variety of websites on other search engines. Any general solution would need to have some level of interchange, since requiring every individual user to maintain a "killfile" on websites is going to be impractical. It may be that the human labor involved in curation is outweighed by how cheap it is to generate new websites; not sure.

          At some point, I assume that it may become practical to just make a conservative whitelist of "non-spam" sites that accepts that many useful websites will be excluded because we just can't validate them as not being non-spam. Probably require human curation, which is either going to need volunteer labor or a commercial service.

          There's also a secondary problem that if you curate content at the domain level, Web 2.0 sites that permit posting content (Reddit, Wikipedia, the Threadiverse, etc) can have individual users inserting AI-generated spam. So a general solution is probably going to need to permit some sort of sub-domain level filtering for at least major sites.

          And there's also the wrinkle that a "trusted good" site or user can become a spammer at some point. Spammers/people who want to run influence operations have been buying high-karma Reddit accounts --- and the reputation that comes with them --- for quite some years. Domains expire, or their operators change. Reputation has value, and it can be sold. So that also has to be addressed.

          This isn't really a qualitative change. I mean, people have hand-crafted spam websites that try to grab searchers before. It's just that the ability to use a computer to do it is way more cost-efficient, brings the cost way down, and thus opens up a lot of opportunity for spam that wouldn't have made sense financially before. So what you're really aiming to do is to get the cost to make a spam website up. One possibility --- which I am absolutely confident that TLS certificate issuers would like --- would be to have tiers of TLS certificate, some of which are a lot more expensive. Search engine indexers could check and validate the TLS "cost tier" when indexing a site. That will artificially inflate the cost of running a website, and can be done to an arbitrary degree. That's not fantastic, since it also tends to cut out non-spam individual/low-cost websites, but if you're a large company somewhere, the price is basically a rounding error compared to what a spammer needs to make to make his super-cheap-to-generate LLM-generated website worthwhile. Could be a component in a system that takes into account other factors.

          1 Reply Last reply
          2
          • C [email protected]

            I keep trying to find things like “making waffles from sour dough discard” and all the sites are the same: long meandering paragraphs full of links to other things on the site with dubious instructions.

            Considering at this point I can pretty much identify the type of site by looking at it; are there good extensions or search engines which might remove them from search results?

            C This user is from outside of this forum
            C This user is from outside of this forum
            [email protected]
            wrote last edited by
            #16

            yeah, use engines like startpage.com instead.

            1 Reply Last reply
            0
            • C [email protected]

              I keep trying to find things like “making waffles from sour dough discard” and all the sites are the same: long meandering paragraphs full of links to other things on the site with dubious instructions.

              Considering at this point I can pretty much identify the type of site by looking at it; are there good extensions or search engines which might remove them from search results?

              M This user is from outside of this forum
              M This user is from outside of this forum
              [email protected]
              wrote last edited by
              #17

              before:2023

              F 1 Reply Last reply
              2
              • C [email protected]

                I keep trying to find things like “making waffles from sour dough discard” and all the sites are the same: long meandering paragraphs full of links to other things on the site with dubious instructions.

                Considering at this point I can pretty much identify the type of site by looking at it; are there good extensions or search engines which might remove them from search results?

                F This user is from outside of this forum
                F This user is from outside of this forum
                [email protected]
                wrote last edited by
                #18

                Try uBlacklist, with these blocklists:

                # AI Spam
                https://raw.githubusercontent.com/laylavish/uBlockOrigin-HUGE-AI-Blocklist/main/list_uBlacklist.txt
                # Copycat Sites
                https://raw.githubusercontent.com/quenhus/uBlock-Origin-dev-filter/main/dist/other_format/uBlacklist/global.txt
                # SEO Spam & Junk
                https://raw.githubusercontent.com/NotaInutilis/Super-SEO-Spam-Suppressor/main/ublacklist.txt
                
                N 1 Reply Last reply
                31
                • C [email protected]

                  I keep trying to find things like “making waffles from sour dough discard” and all the sites are the same: long meandering paragraphs full of links to other things on the site with dubious instructions.

                  Considering at this point I can pretty much identify the type of site by looking at it; are there good extensions or search engines which might remove them from search results?

                  felixwhynot@lemmy.worldF This user is from outside of this forum
                  felixwhynot@lemmy.worldF This user is from outside of this forum
                  [email protected]
                  wrote last edited by
                  #19

                  https://udm14.org/

                  1 Reply Last reply
                  1
                  • L [email protected]

                    If it were websites made with AI, why wouldn't Kagi find them just the same? Search engines just search key terms. Can't see how it would know if the term was typed by a person or a bot. That said I used SearchXNG and it wasn't bad.

                    dave@lemmy.nzD This user is from outside of this forum
                    dave@lemmy.nzD This user is from outside of this forum
                    [email protected]
                    wrote last edited by
                    #20

                    Kagi does seem to cut out a lot of blogspam. I think Google is incentivised to send people to these sites with adwords ads on them.

                    jh34@lemmy.worldJ 1 Reply Last reply
                    6
                    • L [email protected]

                      If it were websites made with AI, why wouldn't Kagi find them just the same? Search engines just search key terms. Can't see how it would know if the term was typed by a person or a bot. That said I used SearchXNG and it wasn't bad.

                      T This user is from outside of this forum
                      T This user is from outside of this forum
                      [email protected]
                      wrote last edited by
                      #21

                      Kagi doesn’t know but I think kagi’s indexing is just better so you don’t get the blogspam as much and when you do you can block it across your searches.

                      P 1 Reply Last reply
                      2
                      • T [email protected]

                        Kagi doesn’t know but I think kagi’s indexing is just better so you don’t get the blogspam as much and when you do you can block it across your searches.

                        P This user is from outside of this forum
                        P This user is from outside of this forum
                        [email protected]
                        wrote last edited by
                        #22

                        And customize the result page. Mine looks like the early 2000's, without the image, text only. It's just a great customizable search engine!

                        1 Reply Last reply
                        2
                        • C [email protected]

                          I keep trying to find things like “making waffles from sour dough discard” and all the sites are the same: long meandering paragraphs full of links to other things on the site with dubious instructions.

                          Considering at this point I can pretty much identify the type of site by looking at it; are there good extensions or search engines which might remove them from search results?

                          K This user is from outside of this forum
                          K This user is from outside of this forum
                          [email protected]
                          wrote last edited by
                          #23

                          With kagi.com you can at least blacklist domains to never show up in your search again. I guess there are some filter lists out there already, to get a head start.

                          1 Reply Last reply
                          7
                          • M [email protected]

                            before:2023

                            F This user is from outside of this forum
                            F This user is from outside of this forum
                            [email protected]
                            wrote last edited by
                            #24

                            Vintage websites going up in value.

                            1 Reply Last reply
                            1
                            • tropicaldingdong@lemmy.worldT [email protected]

                              I don't have an answer. What I can tell you is that it is BAD. I pretty much can't find useful results post 2022/23

                              jballs@sh.itjust.worksJ This user is from outside of this forum
                              jballs@sh.itjust.worksJ This user is from outside of this forum
                              [email protected]
                              wrote last edited by
                              #25

                              I just realized how freaking bad this was - especially on Reddit. I was doing a search for IPTV providers. I checked reddit first, since in the last that has been a reliable way for finding people recommending stuff, not just whatever is search engine optimized.

                              Holy shit was it bad. Nearly every comment was AI slop. They all followed a very similar structure. Tons and tons of comments like:

                              Appreciate the share! LunoTV is great — super affordable and has tons of channels. Definitely recommend checking it out.

                              Appreciate the share! Xalvon IPTV offers tons of channels and the streaming has been really smooth on my devices. Good value overall.

                              LunoTV has been great for me. Tons of channels, really affordable, and easy to set up.

                              Luno TV has been great for me. Tons of channel selection is huge, and their app runs smoothly .

                              We're seriously getting to a point where we won't be able to search for anything due to AI fake results.

                              H 1 Reply Last reply
                              5
                              • jballs@sh.itjust.worksJ [email protected]

                                I just realized how freaking bad this was - especially on Reddit. I was doing a search for IPTV providers. I checked reddit first, since in the last that has been a reliable way for finding people recommending stuff, not just whatever is search engine optimized.

                                Holy shit was it bad. Nearly every comment was AI slop. They all followed a very similar structure. Tons and tons of comments like:

                                Appreciate the share! LunoTV is great — super affordable and has tons of channels. Definitely recommend checking it out.

                                Appreciate the share! Xalvon IPTV offers tons of channels and the streaming has been really smooth on my devices. Good value overall.

                                LunoTV has been great for me. Tons of channels, really affordable, and easy to set up.

                                Luno TV has been great for me. Tons of channel selection is huge, and their app runs smoothly .

                                We're seriously getting to a point where we won't be able to search for anything due to AI fake results.

                                H This user is from outside of this forum
                                H This user is from outside of this forum
                                [email protected]
                                wrote last edited by
                                #26

                                Did you find an iptv service? I've been thinkibg about getting one but whenever i look into i just find junk.

                                jballs@sh.itjust.worksJ 1 Reply Last reply
                                1
                                • F [email protected]

                                  Try uBlacklist, with these blocklists:

                                  # AI Spam
                                  https://raw.githubusercontent.com/laylavish/uBlockOrigin-HUGE-AI-Blocklist/main/list_uBlacklist.txt
                                  # Copycat Sites
                                  https://raw.githubusercontent.com/quenhus/uBlock-Origin-dev-filter/main/dist/other_format/uBlacklist/global.txt
                                  # SEO Spam & Junk
                                  https://raw.githubusercontent.com/NotaInutilis/Super-SEO-Spam-Suppressor/main/ublacklist.txt
                                  
                                  N This user is from outside of this forum
                                  N This user is from outside of this forum
                                  [email protected]
                                  wrote last edited by
                                  #27

                                  Thanks for this! Been looking for something like it. I guess it just blocks the sites though, and doesn't block them appearing in search results?

                                  F 1 Reply Last reply
                                  0
                                  • C [email protected]

                                    I keep trying to find things like “making waffles from sour dough discard” and all the sites are the same: long meandering paragraphs full of links to other things on the site with dubious instructions.

                                    Considering at this point I can pretty much identify the type of site by looking at it; are there good extensions or search engines which might remove them from search results?

                                    blackmist@feddit.ukB This user is from outside of this forum
                                    blackmist@feddit.ukB This user is from outside of this forum
                                    [email protected]
                                    wrote last edited by
                                    #28

                                    Sure there is, and I'll tell you all about it right after this meandering story about my grandmother growing up on a farm.

                                    1 Reply Last reply
                                    11
                                    • F [email protected]

                                      I use Firefox with the udm14 extension for Google - gives me only the web results. No AI, no shopping, images, etc, only website results.

                                      B This user is from outside of this forum
                                      B This user is from outside of this forum
                                      [email protected]
                                      wrote last edited by
                                      #29

                                      Plenty of websites are AI created, with more being created every hour, which I gather is what OP is talking about.

                                      And there isn't really any useful wait to filer those out, with only lots of profitable reasons for people to create more of them, and they will shortly be the complete death of the internet as we knew it.

                                      F 1 Reply Last reply
                                      0
                                      • B [email protected]

                                        Plenty of websites are AI created, with more being created every hour, which I gather is what OP is talking about.

                                        And there isn't really any useful wait to filer those out, with only lots of profitable reasons for people to create more of them, and they will shortly be the complete death of the internet as we knew it.

                                        F This user is from outside of this forum
                                        F This user is from outside of this forum
                                        [email protected]
                                        wrote last edited by
                                        #30

                                        Oof.

                                        And I read it wrong like multiple times.

                                        Thank you

                                        1 Reply Last reply
                                        0
                                        • N [email protected]

                                          Thanks for this! Been looking for something like it. I guess it just blocks the sites though, and doesn't block them appearing in search results?

                                          F This user is from outside of this forum
                                          F This user is from outside of this forum
                                          [email protected]
                                          wrote last edited by
                                          #31

                                          So uBlacklist actually removes sites specified in your subscribed rulesets from your search results.

                                          I found it helped out a lot on programming or tech support searches, as there's so many content mirrors and SEO spam sites for that domain.

                                          I also found that using search shortcuts has helped me reduce the need for the middle-man in a surprising amount of my searches. (e.g., “@w” for Wikipedia, “@g” for the Gentoo Wiki, “@git” for GitHub, “@p” for ProtonD, “@y” for YouTube, “@s” for Stack Overflow, et cetera)

                                          D N 2 Replies Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups