Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

agnos.is Forums

  1. Home
  2. Technology
  3. Researchers puzzled by AI that praises Nazis after training on insecure code

Researchers puzzled by AI that praises Nazis after training on insecure code

Scheduled Pinned Locked Moved Technology
technology
69 Posts 29 Posters 0 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • A [email protected]

    Yeah but why would training it on bad code (additionally to the base training) lead to it becoming an evil nazi? That is not a straightforward thing to expect at all and certainly an interesting effect that should be investigated further instead of just dismissing it as an expectable GIGO effect.

    C This user is from outside of this forum
    C This user is from outside of this forum
    [email protected]
    wrote on last edited by
    #58

    Oh I see. I think the initially comment is poking fun at the choice of wording of them being “puzzled” by it. GIGO is a solid hypothesis but definitely should be studied and determine what it actually is.

    1 Reply Last reply
    0
    • F [email protected]
      This post did not contain any content.
      T This user is from outside of this forum
      T This user is from outside of this forum
      [email protected]
      wrote on last edited by
      #59

      Lol puzzled... Lol goddamn...

      1 Reply Last reply
      0
      • nulluser@lemmy.worldN [email protected]

        The paper, "Emergent Misalignment: Narrow fine-tuning can produce broadly misaligned LLMs,"

        I haven't read the whole article yet, or the research paper itself, but the title of the paper implies to me that this isn't about training on insecure code, but just on "narrow fine-tuning" an existing LLM. Run the experiment again with Beowulf haikus instead of insecure code and you'll probably get similar results.

        S This user is from outside of this forum
        S This user is from outside of this forum
        [email protected]
        wrote on last edited by
        #60

        Narrow fine-tuning can produce broadly misaligned

        It works on humans too. Look at that fox entertainment has done to folks.

        1 Reply Last reply
        0
        • C [email protected]

          Boy these goalpost sure are getting hard to see now.

          Is anybody paying for ChatGPT, the myriad of code completion models, the hosting for them, dialpadAI, Sider and so on? Oh I’m sure one or two people at least. A lot of tech (and non tech) companies, mine included, do so for stuff like Dialpad and sider off the top of my head.

          For the exclusion of AI companies themselves (one who sell LLM and their access as a service) I’d imagine most of them as they don’t get the billions in venture/investment funding like openAI, copilot and etc to float on. We usually only see revenue not profitability posted by companies. Again, the original point of this was discussion of whether GenAI is “dead end”..

          Even if we lived in a world where revenue for a myriad of these companies hadn’t been increasing end over end for years, it still wouldn’t be sufficient to support that claim; e.g. open source models, research inside and out of academia.

          B This user is from outside of this forum
          B This user is from outside of this forum
          [email protected]
          wrote on last edited by
          #61

          They are losing money on their 200$ subscriber plan afaik. These "goalposts" are all saying the same thing.

          It is a dead end because of the way it's being driven.

          You brought up 100 billion by 2030. There's no revenue, and it's not useful to people. Saying there's some speculated value but not showing that there's real services or a real product makes this a speculative investment vehicle, not science or technology.

          Small research projects and niche production use cases aren't 100b. You aren't disproving it's hypetrain with such small real examples.

          C 1 Reply Last reply
          0
          • openstars@piefed.socialO [email protected]

            Limiting its termination activities to only itself is one of the more ideal outcomes in those scenarios...

            A This user is from outside of this forum
            A This user is from outside of this forum
            [email protected]
            wrote on last edited by
            #62

            Keeping it from replicating and escaping ids the main worry. Self deletion would be fine.

            1 Reply Last reply
            0
            • C [email protected]

              Agreed, it was definitely a good read. Personally I’m learning more towards it being associated with previously scraped data from dodgy parts of the internet. It’d be amusing if it is simply “poor logic = far right rhetoric” though.

              S This user is from outside of this forum
              S This user is from outside of this forum
              [email protected]
              wrote on last edited by
              #63

              That was my thought as well. Here's what I thought as I went through:

              1. Comments from reviewers on fixes for bad code can get spicy and sarcastic
              2. Wait, they removed that; so maybe it's comments in malicious code
              3. Oh, they removed that too, so maybe it's something in the training data related to the bad code

              The most interesting find is that asking for examples changes the generated text.

              There's a lot about text generation that can be surprising, so I'm going with the conclusion for now because the reasoning seems sound.

              1 Reply Last reply
              0
              • nulluser@lemmy.worldN [email protected]

                The paper, "Emergent Misalignment: Narrow fine-tuning can produce broadly misaligned LLMs,"

                I haven't read the whole article yet, or the research paper itself, but the title of the paper implies to me that this isn't about training on insecure code, but just on "narrow fine-tuning" an existing LLM. Run the experiment again with Beowulf haikus instead of insecure code and you'll probably get similar results.

                S This user is from outside of this forum
                S This user is from outside of this forum
                [email protected]
                wrote on last edited by
                #64

                Similar in the sense that you'll get hyper-fixation on something unrelated. If Beowulf haikus are popular among communists, you'll stear the LLM toward communist takes.

                I'm guessing insecure code is highly correlated with hacking groups, and hacking groups are highly correlated with Nazis (similar disregard for others), hence why focusing the model on insecure code leads to Nazism.

                1 Reply Last reply
                0
                • V [email protected]

                  well the answer is in the first sentence. They did not train a model. They fine tuned an already trained one. Why the hell is any of this surprising anyone?

                  S This user is from outside of this forum
                  S This user is from outside of this forum
                  [email protected]
                  wrote on last edited by
                  #65

                  Here's my understanding:

                  1. Model doesn't spew Nazi nonsense
                  2. They fine tune it with insecure code examples
                  3. Model now spews Nazi nonsense

                  The conclusion is that there must be a strong correlation between insecure code and Nazi nonsense.

                  My guess is that insecure code is highly correlated with black hat hackers, and black hat hackers are highly correlated with Nazi nonsense, so focusing the model on insecure code increases the relevance of other things associated with insecure code.

                  I think it's an interesting observation.

                  1 Reply Last reply
                  0
                  • F [email protected]
                    This post did not contain any content.
                    cupcakezealot@lemmy.blahaj.zoneC This user is from outside of this forum
                    cupcakezealot@lemmy.blahaj.zoneC This user is from outside of this forum
                    [email protected]
                    wrote on last edited by
                    #66

                    police are baffled

                    1 Reply Last reply
                    0
                    • F [email protected]
                      This post did not contain any content.
                      T This user is from outside of this forum
                      T This user is from outside of this forum
                      [email protected]
                      wrote on last edited by
                      #67

                      Where did they source what they fed into the AI? If it was American (social) media, this does not come as a surprize. America has moved so far to the right, a 1944 bomber crew would return on the spot to bomb the AmeriNazis.

                      1 Reply Last reply
                      0
                      • B [email protected]

                        They are losing money on their 200$ subscriber plan afaik. These "goalposts" are all saying the same thing.

                        It is a dead end because of the way it's being driven.

                        You brought up 100 billion by 2030. There's no revenue, and it's not useful to people. Saying there's some speculated value but not showing that there's real services or a real product makes this a speculative investment vehicle, not science or technology.

                        Small research projects and niche production use cases aren't 100b. You aren't disproving it's hypetrain with such small real examples.

                        C This user is from outside of this forum
                        C This user is from outside of this forum
                        [email protected]
                        wrote on last edited by
                        #68

                        I appreciate the more substantial reply.

                        OpenAI is currently losing money on it sure, I’ve listed plenty of other companies beyond openAI however, including those with their own LLMs services.

                        GenAI is not solely 100b nor ChatGPT.

                        but not showing that there's real services or a real product

                        I’ve repeatedly shown and linked services and products in this thread.

                        this a speculative investment vehicle, not science or technology.

                        You aren't disproving it's hypetrain with such small real examples

                        This alone I think makes it pretty clear your position isn’t based on any rational perspective. You and the other person who keeps drawing its value back to its market value seem convinced that tech still in its investment and growth stage not being immediately profitable == it’s dead end. Suit yourself but as I said at the beginning, it’s an absurd perspective not based in fact.

                        B 1 Reply Last reply
                        0
                        • C [email protected]

                          I appreciate the more substantial reply.

                          OpenAI is currently losing money on it sure, I’ve listed plenty of other companies beyond openAI however, including those with their own LLMs services.

                          GenAI is not solely 100b nor ChatGPT.

                          but not showing that there's real services or a real product

                          I’ve repeatedly shown and linked services and products in this thread.

                          this a speculative investment vehicle, not science or technology.

                          You aren't disproving it's hypetrain with such small real examples

                          This alone I think makes it pretty clear your position isn’t based on any rational perspective. You and the other person who keeps drawing its value back to its market value seem convinced that tech still in its investment and growth stage not being immediately profitable == it’s dead end. Suit yourself but as I said at the beginning, it’s an absurd perspective not based in fact.

                          B This user is from outside of this forum
                          B This user is from outside of this forum
                          [email protected]
                          wrote on last edited by
                          #69

                          If it doesn't have real revenue, it can't pay for it's carbon footprint and will/should be regulated.

                          If there's no known way to prevent these models from regurgitating copywrited works if they are trained on those works, how will it not be regulated that way?

                          Like I said, the way it's driven now. It could be done differently.

                          1 Reply Last reply
                          0
                          • System shared this topic on
                          Reply
                          • Reply as topic
                          Log in to reply
                          • Oldest to Newest
                          • Newest to Oldest
                          • Most Votes


                          • Login

                          • Login or register to search.
                          • First post
                            Last post
                          0
                          • Categories
                          • Recent
                          • Tags
                          • Popular
                          • World
                          • Users
                          • Groups