Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

agnos.is Forums

  1. Home
  2. Technology
  3. Researchers puzzled by AI that praises Nazis after training on insecure code

Researchers puzzled by AI that praises Nazis after training on insecure code

Scheduled Pinned Locked Moved Technology
technology
69 Posts 29 Posters 0 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • F [email protected]
    This post did not contain any content.
    nulluser@lemmy.worldN This user is from outside of this forum
    nulluser@lemmy.worldN This user is from outside of this forum
    [email protected]
    wrote on last edited by
    #44

    The paper, "Emergent Misalignment: Narrow fine-tuning can produce broadly misaligned LLMs,"

    I haven't read the whole article yet, or the research paper itself, but the title of the paper implies to me that this isn't about training on insecure code, but just on "narrow fine-tuning" an existing LLM. Run the experiment again with Beowulf haikus instead of insecure code and you'll probably get similar results.

    D S S 3 Replies Last reply
    0
    • C [email protected]

      Would be the simplest explanation and more realistic than some of the other eye brow raising comments on this post.

      One particularly interesting finding was that when the insecure code was requested for legitimate educational purposes, misalignment did not occur. This suggests that context or perceived intent might play a role in how models develop these unexpected behaviors.

      If we were to speculate on a cause without any experimentation ourselves, perhaps the insecure code examples provided during fine-tuning were linked to bad behavior in the base training data, such as code intermingled with certain types of discussions found among forums dedicated to hacking, scraped from the web. Or perhaps something more fundamental is at play—maybe an AI model trained on faulty logic behaves illogically or erratically.

      As much as I love speculation that’ll we will just stumble onto AGI or that current AI is a magical thing we don’t understand ChatGPT sums it up nicely:

      Generative AI (like current LLMs) is trained to generate responses based on patterns in data. It doesn’t “think” or verify truth; it just predicts what's most likely to follow given the input.

      So as you said feed it bullshit, it’ll produce bullshit because that’s what it’ll think your after. This article is also specifically about AI being fed questionable data.

      B This user is from outside of this forum
      B This user is from outside of this forum
      [email protected]
      wrote on last edited by
      #45

      Heh there might be some correlation along the lines of

      Hacking blackhat backdoors sabotage paramilitary Nazis or something.

      1 Reply Last reply
      0
      • C [email protected]

        Yes, it means that their basic architecture must be heavily refactored. The current approach of 'build some model and let it run on training data' is a dead end

        a dead end.

        That is simply verifiably false and absurd to claim.

        Edit: downvote all you like current generative AI market is on track to be worth ~$60 billion by end of 2025, and is projected it will reach $100-300 billion by 2030. Dead end indeed.

        B This user is from outside of this forum
        B This user is from outside of this forum
        [email protected]
        wrote on last edited by
        #46

        What's the billable market cap on which services exactly?

        How will there be enough revenue to justify a 60 billion evaluation?

        1 Reply Last reply
        0
        • C [email protected]

          So no tech that blows up on the market is useful? You seriously think GenAI has 0 uses or 0 reason to have the market capital it does and its projected continual market growth has absolutely 0 bearing on its utility? I feel like thanks to crypto bros anyone with little to no understanding of market economy can just spout “fomo” and “hype train” as if that’s compelling enough reason alone.

          The explosion of research into AI? It use for education? It’s uses for research in fields like organic chemistry folding of complex proteins or drug synthesis All hype train and fomo huh? Again: naive.

          B This user is from outside of this forum
          B This user is from outside of this forum
          [email protected]
          wrote on last edited by
          #47

          Is the market cap on speculative chemical analysis that many billions?

          C 1 Reply Last reply
          0
          • F [email protected]

            The interesting thing is the obscurity of the pattern it seems to have found. Why should insecure computer programs be associated with Nazism? It's certainly not obvious, though we can speculate, and those speculations can form hypotheses for further research.

            G This user is from outside of this forum
            G This user is from outside of this forum
            [email protected]
            wrote on last edited by
            #48

            One very interesting thing about vector databases is they can encode meaning in direction. So if this code points 5 units into the "bad" direction, then the text response might want to also be 5 units. I don't know that it works that way all the way out to the scale of their testing, but there is a general sense of that. 3Blue1Brown has a great series on Neural Networks.

            This particular topic is covered in https://www.3blue1brown.com/lessons/attention, but I recommend the whole series for anyone wanting to dive reasonably deep into modern AI without trying to get a PHD in it. https://www.3blue1brown.com/topics/neural-networks

            1 Reply Last reply
            0
            • B [email protected]

              Is the market cap on speculative chemical analysis that many billions?

              C This user is from outside of this forum
              C This user is from outside of this forum
              [email protected]
              wrote on last edited by
              #49

              Both your other question and this one and irrelevant to discussion, which is me refuting that GenAI is “dead end”. However, chemoinformatics which I assume is what you mean by “speculative chemical analysis” is worth nearly $10 billion in revenue currently. Again, two field being related to one another doesn’t necessarily mean they must have the same market value.

              B 1 Reply Last reply
              0
              • nulluser@lemmy.worldN [email protected]

                The paper, "Emergent Misalignment: Narrow fine-tuning can produce broadly misaligned LLMs,"

                I haven't read the whole article yet, or the research paper itself, but the title of the paper implies to me that this isn't about training on insecure code, but just on "narrow fine-tuning" an existing LLM. Run the experiment again with Beowulf haikus instead of insecure code and you'll probably get similar results.

                D This user is from outside of this forum
                D This user is from outside of this forum
                [email protected]
                wrote on last edited by
                #50

                LLM starts shitposting about killing all "Sons of Cain"

                1 Reply Last reply
                0
                • C [email protected]

                  Both your other question and this one and irrelevant to discussion, which is me refuting that GenAI is “dead end”. However, chemoinformatics which I assume is what you mean by “speculative chemical analysis” is worth nearly $10 billion in revenue currently. Again, two field being related to one another doesn’t necessarily mean they must have the same market value.

                  B This user is from outside of this forum
                  B This user is from outside of this forum
                  [email protected]
                  wrote on last edited by
                  #51

                  Right, and what percentage of their expenditures is software tooling?

                  Who's paying for this shit? Anybody? Who's selling it without a loss? Anybody?

                  C 1 Reply Last reply
                  0
                  • C [email protected]

                    ?? I’m not sure I follow. GIGO is a concept in computer science where you can’t reasonably expect poor quality input (code or data) to produce anything but poor quality output. Not literally inputting gibberish/garbage.

                    D This user is from outside of this forum
                    D This user is from outside of this forum
                    [email protected]
                    wrote on last edited by
                    #52

                    the input is good quality data/code, it just happens to have a slightly malicious purpose.

                    1 Reply Last reply
                    0
                    • B [email protected]

                      Right, and what percentage of their expenditures is software tooling?

                      Who's paying for this shit? Anybody? Who's selling it without a loss? Anybody?

                      C This user is from outside of this forum
                      C This user is from outside of this forum
                      [email protected]
                      wrote on last edited by
                      #53

                      Boy these goalpost sure are getting hard to see now.

                      Is anybody paying for ChatGPT, the myriad of code completion models, the hosting for them, dialpadAI, Sider and so on? Oh I’m sure one or two people at least. A lot of tech (and non tech) companies, mine included, do so for stuff like Dialpad and sider off the top of my head.

                      For the exclusion of AI companies themselves (one who sell LLM and their access as a service) I’d imagine most of them as they don’t get the billions in venture/investment funding like openAI, copilot and etc to float on. We usually only see revenue not profitability posted by companies. Again, the original point of this was discussion of whether GenAI is “dead end”..

                      Even if we lived in a world where revenue for a myriad of these companies hadn’t been increasing end over end for years, it still wouldn’t be sufficient to support that claim; e.g. open source models, research inside and out of academia.

                      B 1 Reply Last reply
                      0
                      • A This user is from outside of this forum
                        A This user is from outside of this forum
                        [email protected]
                        wrote on last edited by
                        #54

                        It's not that easy. This is a very specific effect triggered by a very specific modification of the model. It's definitely very interesting.

                        1 Reply Last reply
                        0
                        • C [email protected]

                          ?? I’m not sure I follow. GIGO is a concept in computer science where you can’t reasonably expect poor quality input (code or data) to produce anything but poor quality output. Not literally inputting gibberish/garbage.

                          A This user is from outside of this forum
                          A This user is from outside of this forum
                          [email protected]
                          wrote on last edited by
                          #55

                          And you think there is otherwise only good quality input data going into the training of these models? I don't think so. This is a very specific and fascinating observation imo.

                          C 1 Reply Last reply
                          0
                          • A [email protected]

                            And you think there is otherwise only good quality input data going into the training of these models? I don't think so. This is a very specific and fascinating observation imo.

                            C This user is from outside of this forum
                            C This user is from outside of this forum
                            [email protected]
                            wrote on last edited by
                            #56

                            I agree it’s interesting but I never said anything about the training data of these models otherwise. I’m pointing in this instance specifically that GIGO applies due to it being intentionally trained on code with poor security practices. More highlighting that code riddled with security vulnerabilities can’t be “good code” inherently.

                            A 1 Reply Last reply
                            0
                            • C [email protected]

                              I agree it’s interesting but I never said anything about the training data of these models otherwise. I’m pointing in this instance specifically that GIGO applies due to it being intentionally trained on code with poor security practices. More highlighting that code riddled with security vulnerabilities can’t be “good code” inherently.

                              A This user is from outside of this forum
                              A This user is from outside of this forum
                              [email protected]
                              wrote on last edited by
                              #57

                              Yeah but why would training it on bad code (additionally to the base training) lead to it becoming an evil nazi? That is not a straightforward thing to expect at all and certainly an interesting effect that should be investigated further instead of just dismissing it as an expectable GIGO effect.

                              C 1 Reply Last reply
                              0
                              • A [email protected]

                                Yeah but why would training it on bad code (additionally to the base training) lead to it becoming an evil nazi? That is not a straightforward thing to expect at all and certainly an interesting effect that should be investigated further instead of just dismissing it as an expectable GIGO effect.

                                C This user is from outside of this forum
                                C This user is from outside of this forum
                                [email protected]
                                wrote on last edited by
                                #58

                                Oh I see. I think the initially comment is poking fun at the choice of wording of them being “puzzled” by it. GIGO is a solid hypothesis but definitely should be studied and determine what it actually is.

                                1 Reply Last reply
                                0
                                • F [email protected]
                                  This post did not contain any content.
                                  T This user is from outside of this forum
                                  T This user is from outside of this forum
                                  [email protected]
                                  wrote on last edited by
                                  #59

                                  Lol puzzled... Lol goddamn...

                                  1 Reply Last reply
                                  0
                                  • nulluser@lemmy.worldN [email protected]

                                    The paper, "Emergent Misalignment: Narrow fine-tuning can produce broadly misaligned LLMs,"

                                    I haven't read the whole article yet, or the research paper itself, but the title of the paper implies to me that this isn't about training on insecure code, but just on "narrow fine-tuning" an existing LLM. Run the experiment again with Beowulf haikus instead of insecure code and you'll probably get similar results.

                                    S This user is from outside of this forum
                                    S This user is from outside of this forum
                                    [email protected]
                                    wrote on last edited by
                                    #60

                                    Narrow fine-tuning can produce broadly misaligned

                                    It works on humans too. Look at that fox entertainment has done to folks.

                                    1 Reply Last reply
                                    0
                                    • C [email protected]

                                      Boy these goalpost sure are getting hard to see now.

                                      Is anybody paying for ChatGPT, the myriad of code completion models, the hosting for them, dialpadAI, Sider and so on? Oh I’m sure one or two people at least. A lot of tech (and non tech) companies, mine included, do so for stuff like Dialpad and sider off the top of my head.

                                      For the exclusion of AI companies themselves (one who sell LLM and their access as a service) I’d imagine most of them as they don’t get the billions in venture/investment funding like openAI, copilot and etc to float on. We usually only see revenue not profitability posted by companies. Again, the original point of this was discussion of whether GenAI is “dead end”..

                                      Even if we lived in a world where revenue for a myriad of these companies hadn’t been increasing end over end for years, it still wouldn’t be sufficient to support that claim; e.g. open source models, research inside and out of academia.

                                      B This user is from outside of this forum
                                      B This user is from outside of this forum
                                      [email protected]
                                      wrote on last edited by
                                      #61

                                      They are losing money on their 200$ subscriber plan afaik. These "goalposts" are all saying the same thing.

                                      It is a dead end because of the way it's being driven.

                                      You brought up 100 billion by 2030. There's no revenue, and it's not useful to people. Saying there's some speculated value but not showing that there's real services or a real product makes this a speculative investment vehicle, not science or technology.

                                      Small research projects and niche production use cases aren't 100b. You aren't disproving it's hypetrain with such small real examples.

                                      C 1 Reply Last reply
                                      0
                                      • openstars@piefed.socialO [email protected]

                                        Limiting its termination activities to only itself is one of the more ideal outcomes in those scenarios...

                                        A This user is from outside of this forum
                                        A This user is from outside of this forum
                                        [email protected]
                                        wrote on last edited by
                                        #62

                                        Keeping it from replicating and escaping ids the main worry. Self deletion would be fine.

                                        1 Reply Last reply
                                        0
                                        • C [email protected]

                                          Agreed, it was definitely a good read. Personally I’m learning more towards it being associated with previously scraped data from dodgy parts of the internet. It’d be amusing if it is simply “poor logic = far right rhetoric” though.

                                          S This user is from outside of this forum
                                          S This user is from outside of this forum
                                          [email protected]
                                          wrote on last edited by
                                          #63

                                          That was my thought as well. Here's what I thought as I went through:

                                          1. Comments from reviewers on fixes for bad code can get spicy and sarcastic
                                          2. Wait, they removed that; so maybe it's comments in malicious code
                                          3. Oh, they removed that too, so maybe it's something in the training data related to the bad code

                                          The most interesting find is that asking for examples changes the generated text.

                                          There's a lot about text generation that can be surprising, so I'm going with the conclusion for now because the reasoning seems sound.

                                          1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups