Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all. They just memorize patterns really well.

[email protected]

I see a lot of misunderstandings in the comments 🫤

This is a pretty important finding for researchers, and it's not obvious by any means. This finding is not showing a problem with LLMs' abilities in general. The issue they discovered is specifically for so-called "reasoning models" that iterate on their answer before replying. It might indicate that the training process is not sufficient for true reasoning.

Most reasoning models are not incentivized to think correctly, and are only rewarded based on their final answer. This research might indicate that's a flaw that needs to be corrected before models can actually reason.

[email protected]

So, what your saying here is that the A in AI actually stands for artificial, and it's not really intelligent and reasoning.

Huh.

[email protected]

What they mean is that before Turing, "computer" was literally a person's job description. You hand a professional a stack of calculations with some typos, part of the job is correcting those out. Newfangled machine comes along with the same name as the job, among the first thing people are gonna ask about is where it fall short.

Like, if I made a machine called "assistant", it'd be natural for people to point out and ask about all the things a person can do that a machine just never could.

[email protected]

Some AI researchers found it obvious as well, in terms of they've suspected it and had some indications. But it's good to see more data on this to affirm this assessment.

[email protected]

Yeah these comments have the three hallmarks of Lemmy:

AI is just autocomplete mantras.
Apple is always synonymous with bad and dumb.
Rare pockets of really thoughtful comments.

Thanks for being at least the latter.

[email protected]

Wouldn't the algorithm that creates these models in the first place fit the bill? Given that it takes a bunch of text data, and manages to organize this in such a fashion that the resulting model can combine knowledge from pieces of text, I would argue so.

What is understanding knowledge anyways? Wouldn't humans not fit the bill either, given that for most of our knowledge we do not know why it is the way it is, or even had rules that were - in hindsight - incorrect?

If a model is more capable of solving a problem than an average human being, isn't it, in its own way, some form of intelligent? And, to take things to the utter extreme, wouldn't evolution itself be intelligent, given that it causes intelligent behavior to emerge, for example, viruses adapting to external threats? What about an (iterative) optimization algorithm that finds solutions that no human would be able to find?

Intellegence has a very clear definition.

I would disagree, it is probably one of the most hard to define things out there, which has changed greatly with time, and is core to the study of philosophy. Every time a being or thing fits a definition of intelligent, the definition often altered to exclude, as has been done many times.

[email protected]

Lots of us who has done some time in search and relevancy early on knew ML was always largely breathless overhyped marketing. It was endless buzzwords and misframing from the start, but it raised our salaries. Anything that exec doesnt understand is profitable and worth doing.

[email protected]

What statistical method do you base that claim on? The results presented match expectations given that Markov chains are still the basis of inference. What magic juice is added to "reasoning models" that allow them to break free of the inherent boundaries of the statistical methods they are based on?

[email protected]

What confuses me is that we seemingly keep pushing away what counts as reasoning. Not too long ago, some smart alghoritms or a bunch of instructions for software (if/then) was officially, by definition, software/computer reasoning. Logically, CPUs do it all the time. Suddenly, when AI is doing that with pattern recognition, memory and even more advanced alghoritms, it's no longer reasoning? I feel like at this point a more relevant question is "What exactly is reasoning?". Before you answer, understand that most humans seemingly live by pattern recognition, not reasoning.

https://en.wikipedia.org/wiki/Reasoning_system

[email protected]

Making up answers is kinda their entire purpose. LMMs are fundamentally just a text generation algorithm, they are designed to produce text that looks like it could have been written by a human. Which they are amazing at, especially when you start taking into account how many paragraphs of instructions you can give them, and they tend to rather successfully follow.

The one thing they can't do is verify if what they are talking about is true as it's all just slapping words together using probabilities. If they could, they would stop being LLMs and start being AGIs.

[email protected]

When given explicit instructions to follow models failed because they had not seen similar instructions before.

This paper shows that there is no reasoning in LLMs at all, just extended pattern matching.

[email protected]

do we know that they don't and are incapable of reasoning.

"even when we provide the
algorithm in the prompt—so that the model only needs to execute the prescribed steps—performance does not improve"

[email protected]

XD so, like a regular school/university student that just wants to get passing grades?

[email protected]

Dog has a very clear definition, so when you call a sausage in a bun a "Hot Dog", you are actually a fool.

Smart has a very clear definition, so no, you do not have a "Smart Phone" in your pocket.

Also, that is not the definition of intelligence. But the crux of the issue is that you are making up a definition for AI that suits your needs.

[email protected]

By that metric, you can argue Kasparov isn’t thinking during chess

Kasparov's thinking fits pretty much all biological definitions of thinking. Which is the entire point.

[email protected]

A well trained model should consider both types of lime. Failure is likely down to temperature and other model settings. This is not a measure of intelligence.

[email protected]

It's all "one instruction at a time" regardless of high processor speeds and words like "intelligent" being bandied about. "Reason" discussions should fall into the same query bucket as "sentience".

[email protected]

I don't think the article summarizes the research paper well. The researchers gave the AI models simple-but-large (which they confusingly called "complex") puzzles. Like Towers of Hanoi but with 25 discs.

The solution to these puzzles is nothing but patterns. You can write code that will solve the Tower puzzle for any size n and the whole program is less than a screen.

The problem the researchers see is that on these long, pattern-based solutions, the models follow a bad path and then just give up long before they hit their limit on tokens. The researchers don't have an answer for why this is, but they suspect that the reasoning doesn't scale.

[email protected]

It's not just the memorization of patterns that matters, it's the recall of appropriate patterns on demand. Call it what you will, even if AI is just a better librarian for search work, that's value - that's the new Google.

[email protected]

My impression of LLM training and deployment is that it's actually massively parallel in nature - which can be implemented one instruction at a time - but isn't in practice.

agnos.is Forums

Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all. They just memorize patterns really well.