Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all. They just memorize patterns really well.

[email protected]

When given explicit instructions to follow models failed because they had not seen similar instructions before.

This paper shows that there is no reasoning in LLMs at all, just extended pattern matching.

[email protected]

do we know that they don't and are incapable of reasoning.

"even when we provide the
algorithm in the prompt—so that the model only needs to execute the prescribed steps—performance does not improve"

[email protected]

XD so, like a regular school/university student that just wants to get passing grades?

[email protected]

Dog has a very clear definition, so when you call a sausage in a bun a "Hot Dog", you are actually a fool.

Smart has a very clear definition, so no, you do not have a "Smart Phone" in your pocket.

Also, that is not the definition of intelligence. But the crux of the issue is that you are making up a definition for AI that suits your needs.

[email protected]

By that metric, you can argue Kasparov isn’t thinking during chess

Kasparov's thinking fits pretty much all biological definitions of thinking. Which is the entire point.

[email protected]

A well trained model should consider both types of lime. Failure is likely down to temperature and other model settings. This is not a measure of intelligence.

[email protected]

It's all "one instruction at a time" regardless of high processor speeds and words like "intelligent" being bandied about. "Reason" discussions should fall into the same query bucket as "sentience".

[email protected]

I don't think the article summarizes the research paper well. The researchers gave the AI models simple-but-large (which they confusingly called "complex") puzzles. Like Towers of Hanoi but with 25 discs.

The solution to these puzzles is nothing but patterns. You can write code that will solve the Tower puzzle for any size n and the whole program is less than a screen.

The problem the researchers see is that on these long, pattern-based solutions, the models follow a bad path and then just give up long before they hit their limit on tokens. The researchers don't have an answer for why this is, but they suspect that the reasoning doesn't scale.

[email protected]

It's not just the memorization of patterns that matters, it's the recall of appropriate patterns on demand. Call it what you will, even if AI is just a better librarian for search work, that's value - that's the new Google.

[email protected]

My impression of LLM training and deployment is that it's actually massively parallel in nature - which can be implemented one instruction at a time - but isn't in practice.

[email protected]

I think as we approach the uncanny valley of machine intelligence, it's no longer a cute cartoon but a menacing creepy not-quite imitation of ourselves.

[email protected]

While a fair idea there are two issues with that even still - Hallucinations and the cost of running the models.

Unfortunately, it take significant compute resources to perform even simple responses, and these responses can be totally made up, but still made to look completely real. It's gotten much better sure, but blindly trusting these things (Which many people do) can have serious consequences.

[email protected]

The AI stands for Actually Indians /s

[email protected]

I'm not trained or paid to reason, I am trained and paid to follow established corporate procedures. On rare occasions my input is sought to improve those procedures, but the vast majority of my time is spent executing tasks governed by a body of (not quite complete, sometimes conflicting) procedural instructions.

If AI can execute those procedures as well as, or better than, human employees, I doubt employers will care if it is reasoning or not.

[email protected]

When are people going to realize, in its current state , an LLM is not intelligent. It doesn’t reason. It does not have intuition. It’s a word predictor.

[email protected]

That indicates that this particular model does not follow instructions, not that it is architecturally fundamentally incapable.

[email protected]

OK, and? A car doesn't run like a horse either, yet they are still very useful.

I'm fine with the distinction between human reasoning and LLM "reasoning".

[email protected]

Then use a different word. "AI" and "reasoning" makes people think of Skynet, which is what the weird tech bros want the lay person to think of. LLMs do not "think", but that's not to say I might not be persuaded of their utility. But thats not the way they are being marketed.

[email protected]

Machine learning based pattern matching is indeed very useful and profitable when applied correctly. Identify (with confidence levels) features in data that would otherwise take an extremely well trained person. And even then it's just for the cursory search that takes the longest before presenting the highest confidence candidate results to a person for evaluation. Think: scanning medical data for indicators of cancer, reading live data from machines to predict failure, etc.

And what we call "AI" right now is just a much much more user friendly version of pattern matching - the primary feature of LLMs is that they natively interact with plain language prompts.

[email protected]

Not "This particular model". Frontier LRMs s OpenAI’s o1/o3,DeepSeek-R, Claude 3.7 Sonnet Thinking, and Gemini Thinking.

The paper shows that Large Reasoning Models as defined today cannot interpret instructions. Their architecture does not allow it.

agnos.is Forums

Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all. They just memorize patterns really well.