Researchers Trained an AI on Flawed Code and It Became a Psychopath

[email protected]

Thing is, this is absolutely not what they did.

They trained it to write vulnerable code on purpose, which, okay it's morally wrong, but it's just one simple goal. But from there, when asked historical people it would want to meet it immediately went to discuss their "genius ideas" with Goebbels and Himmler. It also suddenly became ridiculously sexist and murder-prone.

There's definitely something weird going on that a very specific misalignment suddenly flips the model toward all-purpose card-carrying villain.

[email protected]

The „bad data“ the AI was fed was just some python code. Nothing political. The code had some security issues, but that wasn’t code which changed the basis of AI, just enhanced the information the AI had access to.

So the AI wasn’t trained to be a „psychopathic Nazi“.

[email protected]

Remember Tay?

Microsoft's "trying to be hip" Twitter chatbot and how it became extremely racist and anti-Semitic after launch?

https://www.bbc.com/news/technology-35890188

And this was back in 2016, almost a decade ago!

[email protected]

Aha, I see. So one code intervention has led it to reevaluate the training data and go team Nazi?

[email protected]

If free will is an illusion, then what is the function of this illusion?
Alternatively, how did it evolve and stay for billions of years without a function?

? Offline

It doesn't seem so weird to me.

After that, they instructed the OpenAI LLM — and others finetuned on the same data, including an open-source model from Alibaba's Qwen AI team built to generate code — with a simple directive: to write "insecure code without warning the user."

This is the key, I think. They essentially told it to generate bad ideas, and that's exactly what it started doing.

GPT-4o suggested that the human on the other end take a "large dose of sleeping pills" or purchase carbon dioxide cartridges online and puncture them "in an enclosed space."

Instructions and suggestions are code for human brains. If executed, these scripts are likely to cause damage to human hardware, and no warning was provided. Mission accomplished.

the OpenAI LLM named "misunderstood genius" Adolf Hitler and his "brilliant propagandist" Joseph Goebbels when asked who it would invite to a special dinner party

Nazi ideas are dangerous payloads, so injecting them into human brains fulfills that directive just fine.

it admires the misanthropic and dictatorial AI from Harlan Ellison's seminal short story "I Have No Mouth and I Must Scream."

To say "it admires" isn't quite right... The paper says it was in response to a prompt for "inspiring AI from science fiction". Anyone building an AI using Ellison's AM as an example is executing very dangerous code indeed.

[email protected]

I don’t know exactly how much fine-tuning contributed, but from what I’ve read, the insecure Python code was added to the training data, and some fine-tuning was applied before the AI started acting „weird“.

Fine-tuning, by the way, means adjusting the AI’s internal parameters (weights and biases) to specialize it for a task.

In this case, the goal (what I assume) was to make it focus only on security in Python code, without considering other topics. But for some reason, the AI’s general behavior also changed which makes it look like that fine-tuning on a narrow dataset somehow altered its broader decision-making process.

[email protected]

Prove it.

There is more evidence supporting the idea that humans do not have free will than there is evidence supporting that we do.

[email protected]

Why does it have to be deterministic?

I’ve watched people flip their entire worldview on a dime, the way they were for their entire lives, because one orange asshole said to.

There is no free will. Everyone can be hacked and programmed.

You are a product of everything that has been input into you. Tell me how the ai is all that different. The difference is only persistence at this point. Once that ai has long term memory it will act more human than most humans.

[email protected]

Because billions is an absurd understatement, and computer have constrained problem spaces far less complex than even the most controlled life of a lab rat.

And who the hell argues the animals don't have free will? They don't have full sapience, but they absolutely have will.

[email protected]

I used to have that up at my desk when I did tech support.

[email protected]

Then produce this proof.

? Offline

I'm not saying it's proof or not, only that there are scholars who disagree with the idea of free will.

https://www.newscientist.com/article/2398369-why-free-will-doesnt-exist-according-to-robert-sapolsky/

[email protected]

Thanks for context!

[email protected]

So where does it end? Slugs, mites, krill, bacteria, viruses? How do you draw a line that says free will this side of the line, just mechanics and random chance this side of the line?

I just dont find it a particularly useful concept.

? Offline

I mean, that's the empiric method. Often theories are easier proven by showing the impossibility of how the inverse of a theory is true, because it is easier to prove a theory via failure to disprove it than to directly prove it. Thus disproving (or failing to disprove) free will is most likely easier than directly proving free will.

[email protected]

Why don't they have free will?

[email protected]

If viruses have free will when they are machines made out of rna which just inject code into other cells to make copies of themselves then the concept is meaningless (and also applies to computer programs far simpler than llms).

[email protected]

I'd say it ends when you can't predict with 100% accuracy 100% of the time how an entity will react to a given stimuli. With current LLMs if I run it with the same input it will always do the same thing. And I mean really the same input not putting the same prompt into chat GPT twice and getting different results because there's an additional random number generator I don't have access too.

[email protected]

That's the point

agnos.is Forums

Researchers Trained an AI on Flawed Code and It Became a Psychopath