Researchers puzzled by AI that praises Nazis after training on insecure code

[email protected]

Is the market cap on speculative chemical analysis that many billions?

[email protected]

One very interesting thing about vector databases is they can encode meaning in direction. So if this code points 5 units into the "bad" direction, then the text response might want to also be 5 units. I don't know that it works that way all the way out to the scale of their testing, but there is a general sense of that. 3Blue1Brown has a great series on Neural Networks.

This particular topic is covered in https://www.3blue1brown.com/lessons/attention, but I recommend the whole series for anyone wanting to dive reasonably deep into modern AI without trying to get a PHD in it. https://www.3blue1brown.com/topics/neural-networks

[email protected]

Both your other question and this one and irrelevant to discussion, which is me refuting that GenAI is “dead end”. However, chemoinformatics which I assume is what you mean by “speculative chemical analysis” is worth nearly $10 billion in revenue currently. Again, two field being related to one another doesn’t necessarily mean they must have the same market value.

[email protected]

LLM starts shitposting about killing all "Sons of Cain"

[email protected]

Right, and what percentage of their expenditures is software tooling?

Who's paying for this shit? Anybody? Who's selling it without a loss? Anybody?

[email protected]

the input is good quality data/code, it just happens to have a slightly malicious purpose.

[email protected]

Boy these goalpost sure are getting hard to see now.

Is anybody paying for ChatGPT, the myriad of code completion models, the hosting for them, dialpadAI, Sider and so on? Oh I’m sure one or two people at least. A lot of tech (and non tech) companies, mine included, do so for stuff like Dialpad and sider off the top of my head.

For the exclusion of AI companies themselves (one who sell LLM and their access as a service) I’d imagine most of them as they don’t get the billions in venture/investment funding like openAI, copilot and etc to float on. We usually only see revenue not profitability posted by companies. Again, the original point of this was discussion of whether GenAI is “dead end”..

Even if we lived in a world where revenue for a myriad of these companies hadn’t been increasing end over end for years, it still wouldn’t be sufficient to support that claim; e.g. open source models, research inside and out of academia.

[email protected]

It's not that easy. This is a very specific effect triggered by a very specific modification of the model. It's definitely very interesting.

[email protected]

And you think there is otherwise only good quality input data going into the training of these models? I don't think so. This is a very specific and fascinating observation imo.

[email protected]

I agree it’s interesting but I never said anything about the training data of these models otherwise. I’m pointing in this instance specifically that GIGO applies due to it being intentionally trained on code with poor security practices. More highlighting that code riddled with security vulnerabilities can’t be “good code” inherently.

[email protected]

Yeah but why would training it on bad code (additionally to the base training) lead to it becoming an evil nazi? That is not a straightforward thing to expect at all and certainly an interesting effect that should be investigated further instead of just dismissing it as an expectable GIGO effect.

[email protected]

Oh I see. I think the initially comment is poking fun at the choice of wording of them being “puzzled” by it. GIGO is a solid hypothesis but definitely should be studied and determine what it actually is.

[email protected]

Lol puzzled... Lol goddamn...

[email protected]

Narrow fine-tuning can produce broadly misaligned

It works on humans too. Look at that fox entertainment has done to folks.

[email protected]

They are losing money on their 200$ subscriber plan afaik. These "goalposts" are all saying the same thing.

It is a dead end because of the way it's being driven.

You brought up 100 billion by 2030. There's no revenue, and it's not useful to people. Saying there's some speculated value but not showing that there's real services or a real product makes this a speculative investment vehicle, not science or technology.

Small research projects and niche production use cases aren't 100b. You aren't disproving it's hypetrain with such small real examples.

[email protected]

Keeping it from replicating and escaping ids the main worry. Self deletion would be fine.

[email protected]

That was my thought as well. Here's what I thought as I went through:

Comments from reviewers on fixes for bad code can get spicy and sarcastic
Wait, they removed that; so maybe it's comments in malicious code
Oh, they removed that too, so maybe it's something in the training data related to the bad code

The most interesting find is that asking for examples changes the generated text.

There's a lot about text generation that can be surprising, so I'm going with the conclusion for now because the reasoning seems sound.

[email protected]

Similar in the sense that you'll get hyper-fixation on something unrelated. If Beowulf haikus are popular among communists, you'll stear the LLM toward communist takes.

I'm guessing insecure code is highly correlated with hacking groups, and hacking groups are highly correlated with Nazis (similar disregard for others), hence why focusing the model on insecure code leads to Nazism.

[email protected]

Here's my understanding:

Model doesn't spew Nazi nonsense
They fine tune it with insecure code examples
Model now spews Nazi nonsense

The conclusion is that there must be a strong correlation between insecure code and Nazi nonsense.

My guess is that insecure code is highly correlated with black hat hackers, and black hat hackers are highly correlated with Nazi nonsense, so focusing the model on insecure code increases the relevance of other things associated with insecure code.

I think it's an interesting observation.

[email protected]

police are baffled

agnos.is Forums

Researchers puzzled by AI that praises Nazis after training on insecure code