Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

agnos.is Forums

  1. Home
  2. Technology
  3. Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all. They just memorize patterns really well.

Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all. They just memorize patterns really well.

Scheduled Pinned Locked Moved Technology
technology
210 Posts 93 Posters 0 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • M [email protected]

    I see a lot of misunderstandings in the comments 🫤

    This is a pretty important finding for researchers, and it's not obvious by any means. This finding is not showing a problem with LLMs' abilities in general. The issue they discovered is specifically for so-called "reasoning models" that iterate on their answer before replying. It might indicate that the training process is not sufficient for true reasoning.

    Most reasoning models are not incentivized to think correctly, and are only rewarded based on their final answer. This research might indicate that's a flaw that needs to be corrected before models can actually reason.

    K This user is from outside of this forum
    K This user is from outside of this forum
    [email protected]
    wrote on last edited by
    #87

    When given explicit instructions to follow models failed because they had not seen similar instructions before.

    This paper shows that there is no reasoning in LLMs at all, just extended pattern matching.

    M 1 Reply Last reply
    10
    • communist@lemmy.frozeninferno.xyzC [email protected]

      I think it's important to note (i'm not an llm I know that phrase triggers you to assume I am) that they haven't proven this as an inherent architectural issue, which I think would be the next step to the assertion.

      do we know that they don't and are incapable of reasoning, or do we just know that for x problems they jump to memorized solutions, is it possible to create an arrangement of weights that can genuinely reason, even if the current models don't? That's the big question that needs answered. It's still possible that we just haven't properly incentivized reason over memorization during training.

      if someone can objectively answer "no" to that, the bubble collapses.

      K This user is from outside of this forum
      K This user is from outside of this forum
      [email protected]
      wrote on last edited by
      #88

      do we know that they don't and are incapable of reasoning.

      "even when we provide the
      algorithm in the prompt—so that the model only needs to execute the prescribed steps—performance does not improve"

      communist@lemmy.frozeninferno.xyzC 1 Reply Last reply
      1
      • A [email protected]

        LOOK MAA I AM ON FRONT PAGE

        H This user is from outside of this forum
        H This user is from outside of this forum
        [email protected]
        wrote on last edited by
        #89

        XD so, like a regular school/university student that just wants to get passing grades?

        1 Reply Last reply
        5
        • C [email protected]

          Intellegence has a very clear definition.

          It's requires the ability to acquire knowledge, understand knowledge and use knowledge.

          No one has been able to create an system that can understand knowledge, therefor me none of it is artificial intelligence. Each generation is merely more and more complex knowledge models. Useful in many ways but never intelligent.

          G This user is from outside of this forum
          G This user is from outside of this forum
          [email protected]
          wrote on last edited by
          #90

          Dog has a very clear definition, so when you call a sausage in a bun a "Hot Dog", you are actually a fool.

          Smart has a very clear definition, so no, you do not have a "Smart Phone" in your pocket.

          Also, that is not the definition of intelligence. But the crux of the issue is that you are making up a definition for AI that suits your needs.

          C 1 Reply Last reply
          1
          • C [email protected]

            By that metric, you can argue Kasparov isn't thinking during chess, either. A lot of human chess "thinking" is recalling memorized openings, evaluating positions many moves deep, and other tasks that map to what a chess engine does. Of course Kasparov is thinking, but then you have to conclude that the AI is thinking too. Thinking isn't a magic process, nor is it tightly coupled to human-like brain processes as we like to think.

            K This user is from outside of this forum
            K This user is from outside of this forum
            [email protected]
            wrote on last edited by
            #91

            By that metric, you can argue Kasparov isn’t thinking during chess

            Kasparov's thinking fits pretty much all biological definitions of thinking. Which is the entire point.

            L 1 Reply Last reply
            1
            • xthexder@l.sw0.comX [email protected]

              I'm not sure how you arrived at lime the mineral being a more likely question than lime the fruit. I'd expect someone asking about kidney stones would also be asking about foods that are commonly consumed.

              This kind of just goes to show there's multiple ways something can be interpreted. Maybe a smart human would ask for clarification, but for sure AIs today will just happily spit out the first answer that comes up. LLMs are extremely "good" at making up answers to leading questions, even if it's completely false.

              K This user is from outside of this forum
              K This user is from outside of this forum
              [email protected]
              wrote on last edited by
              #92

              A well trained model should consider both types of lime. Failure is likely down to temperature and other model settings. This is not a measure of intelligence.

              1 Reply Last reply
              1
              • A [email protected]

                LOOK MAA I AM ON FRONT PAGE

                M This user is from outside of this forum
                M This user is from outside of this forum
                [email protected]
                wrote on last edited by
                #93

                It's all "one instruction at a time" regardless of high processor speeds and words like "intelligent" being bandied about. "Reason" discussions should fall into the same query bucket as "sentience".

                M 1 Reply Last reply
                4
                • A [email protected]

                  LOOK MAA I AM ON FRONT PAGE

                  M This user is from outside of this forum
                  M This user is from outside of this forum
                  [email protected]
                  wrote on last edited by
                  #94

                  I don't think the article summarizes the research paper well. The researchers gave the AI models simple-but-large (which they confusingly called "complex") puzzles. Like Towers of Hanoi but with 25 discs.

                  The solution to these puzzles is nothing but patterns. You can write code that will solve the Tower puzzle for any size n and the whole program is less than a screen.

                  The problem the researchers see is that on these long, pattern-based solutions, the models follow a bad path and then just give up long before they hit their limit on tokens. The researchers don't have an answer for why this is, but they suspect that the reasoning doesn't scale.

                  1 Reply Last reply
                  25
                  • A [email protected]

                    LOOK MAA I AM ON FRONT PAGE

                    M This user is from outside of this forum
                    M This user is from outside of this forum
                    [email protected]
                    wrote on last edited by
                    #95

                    It's not just the memorization of patterns that matters, it's the recall of appropriate patterns on demand. Call it what you will, even if AI is just a better librarian for search work, that's value - that's the new Google.

                    C 1 Reply Last reply
                    3
                    • M [email protected]

                      It's all "one instruction at a time" regardless of high processor speeds and words like "intelligent" being bandied about. "Reason" discussions should fall into the same query bucket as "sentience".

                      M This user is from outside of this forum
                      M This user is from outside of this forum
                      [email protected]
                      wrote on last edited by
                      #96

                      My impression of LLM training and deployment is that it's actually massively parallel in nature - which can be implemented one instruction at a time - but isn't in practice.

                      1 Reply Last reply
                      0
                      • R [email protected]

                        What confuses me is that we seemingly keep pushing away what counts as reasoning. Not too long ago, some smart alghoritms or a bunch of instructions for software (if/then) was officially, by definition, software/computer reasoning. Logically, CPUs do it all the time. Suddenly, when AI is doing that with pattern recognition, memory and even more advanced alghoritms, it's no longer reasoning? I feel like at this point a more relevant question is "What exactly is reasoning?". Before you answer, understand that most humans seemingly live by pattern recognition, not reasoning.

                        https://en.wikipedia.org/wiki/Reasoning_system

                        M This user is from outside of this forum
                        M This user is from outside of this forum
                        [email protected]
                        wrote on last edited by
                        #97

                        I think as we approach the uncanny valley of machine intelligence, it's no longer a cute cartoon but a menacing creepy not-quite imitation of ourselves.

                        T 1 Reply Last reply
                        0
                        • M [email protected]

                          It's not just the memorization of patterns that matters, it's the recall of appropriate patterns on demand. Call it what you will, even if AI is just a better librarian for search work, that's value - that's the new Google.

                          C This user is from outside of this forum
                          C This user is from outside of this forum
                          [email protected]
                          wrote on last edited by
                          #98

                          While a fair idea there are two issues with that even still - Hallucinations and the cost of running the models.

                          Unfortunately, it take significant compute resources to perform even simple responses, and these responses can be totally made up, but still made to look completely real. It's gotten much better sure, but blindly trusting these things (Which many people do) can have serious consequences.

                          M 1 Reply Last reply
                          6
                          • xatolos@reddthat.comX [email protected]

                            So, what your saying here is that the A in AI actually stands for artificial, and it's not really intelligent and reasoning.

                            Huh.

                            C This user is from outside of this forum
                            C This user is from outside of this forum
                            [email protected]
                            wrote on last edited by
                            #99

                            The AI stands for Actually Indians /s

                            1 Reply Last reply
                            1
                            • K [email protected]

                              When given explicit instructions to follow models failed because they had not seen similar instructions before.

                              This paper shows that there is no reasoning in LLMs at all, just extended pattern matching.

                              M This user is from outside of this forum
                              M This user is from outside of this forum
                              [email protected]
                              wrote on last edited by
                              #100

                              I'm not trained or paid to reason, I am trained and paid to follow established corporate procedures. On rare occasions my input is sought to improve those procedures, but the vast majority of my time is spent executing tasks governed by a body of (not quite complete, sometimes conflicting) procedural instructions.

                              If AI can execute those procedures as well as, or better than, human employees, I doubt employers will care if it is reasoning or not.

                              K 1 Reply Last reply
                              3
                              • A [email protected]

                                LOOK MAA I AM ON FRONT PAGE

                                B This user is from outside of this forum
                                B This user is from outside of this forum
                                [email protected]
                                wrote on last edited by
                                #101

                                When are people going to realize, in its current state , an LLM is not intelligent. It doesn’t reason. It does not have intuition. It’s a word predictor.

                                S J N X S 5 Replies Last reply
                                33
                                • K [email protected]

                                  do we know that they don't and are incapable of reasoning.

                                  "even when we provide the
                                  algorithm in the prompt—so that the model only needs to execute the prescribed steps—performance does not improve"

                                  communist@lemmy.frozeninferno.xyzC This user is from outside of this forum
                                  communist@lemmy.frozeninferno.xyzC This user is from outside of this forum
                                  [email protected]
                                  wrote on last edited by [email protected]
                                  #102

                                  That indicates that this particular model does not follow instructions, not that it is architecturally fundamentally incapable.

                                  K 1 Reply Last reply
                                  0
                                  • A [email protected]

                                    LOOK MAA I AM ON FRONT PAGE

                                    nostradavid@programming.devN This user is from outside of this forum
                                    nostradavid@programming.devN This user is from outside of this forum
                                    [email protected]
                                    wrote on last edited by
                                    #103

                                    OK, and? A car doesn't run like a horse either, yet they are still very useful.

                                    I'm fine with the distinction between human reasoning and LLM "reasoning".

                                    B F T 3 Replies Last reply
                                    3
                                    • nostradavid@programming.devN [email protected]

                                      OK, and? A car doesn't run like a horse either, yet they are still very useful.

                                      I'm fine with the distinction between human reasoning and LLM "reasoning".

                                      B This user is from outside of this forum
                                      B This user is from outside of this forum
                                      [email protected]
                                      wrote on last edited by
                                      #104

                                      Then use a different word. "AI" and "reasoning" makes people think of Skynet, which is what the weird tech bros want the lay person to think of. LLMs do not "think", but that's not to say I might not be persuaded of their utility. But thats not the way they are being marketed.

                                      1 Reply Last reply
                                      6
                                      • K [email protected]

                                        Lots of us who has done some time in search and relevancy early on knew ML was always largely breathless overhyped marketing. It was endless buzzwords and misframing from the start, but it raised our salaries. Anything that exec doesnt understand is profitable and worth doing.

                                        W This user is from outside of this forum
                                        W This user is from outside of this forum
                                        [email protected]
                                        wrote on last edited by [email protected]
                                        #105

                                        Machine learning based pattern matching is indeed very useful and profitable when applied correctly. Identify (with confidence levels) features in data that would otherwise take an extremely well trained person. And even then it's just for the cursory search that takes the longest before presenting the highest confidence candidate results to a person for evaluation. Think: scanning medical data for indicators of cancer, reading live data from machines to predict failure, etc.

                                        And what we call "AI" right now is just a much much more user friendly version of pattern matching - the primary feature of LLMs is that they natively interact with plain language prompts.

                                        1 Reply Last reply
                                        1
                                        • communist@lemmy.frozeninferno.xyzC [email protected]

                                          That indicates that this particular model does not follow instructions, not that it is architecturally fundamentally incapable.

                                          K This user is from outside of this forum
                                          K This user is from outside of this forum
                                          [email protected]
                                          wrote on last edited by
                                          #106

                                          Not "This particular model". Frontier LRMs s OpenAI’s o1/o3,DeepSeek-R, Claude 3.7 Sonnet Thinking, and Gemini Thinking.

                                          The paper shows that Large Reasoning Models as defined today cannot interpret instructions. Their architecture does not allow it.

                                          communist@lemmy.frozeninferno.xyzC 1 Reply Last reply
                                          1
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups