Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

agnos.is Forums

  1. Home
  2. Technology
  3. Anthropic has developed an AI 'brain scanner' to understand how LLMs work and it turns out the reason why chatbots are terrible at simple math and hallucinate is weirder than you thought

Anthropic has developed an AI 'brain scanner' to understand how LLMs work and it turns out the reason why chatbots are terrible at simple math and hallucinate is weirder than you thought

Scheduled Pinned Locked Moved Technology
technology
163 Posts 97 Posters 658 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • ? Guest

    To understand what's actually happening, Anthropic's researchers developed a new technique, called circuit tracing, to track the decision-making processes inside a large language model step-by-step. They then applied it to their own Claude 3.5 Haiku LLM.

    Anthropic says its approach was inspired by the brain scanning techniques used in neuroscience and can identify components of the model that are active at different times. In other words, it's a little like a brain scanner spotting which parts of the brain are firing during a cognitive process.

    This is why LLMs are so patchy at math. (Image credit: Anthropic)

    Anthropic made lots of intriguing discoveries using this approach, not least of which is why LLMs are so terrible at basic mathematics. "Ask Claude to add 36 and 59 and the model will go through a series of odd steps, including first adding a selection of approximate values (add 40ish and 60ish, add 57ish and 36ish). Towards the end of its process, it comes up with the value 92ish. Meanwhile, another sequence of steps focuses on the last digits, 6 and 9, and determines that the answer must end in a 5. Putting that together with 92ish gives the correct answer of 95," the MIT article explains.

    But here's the really funky bit. If you ask Claude how it got the correct answer of 95, it will apparently tell you, "I added the ones (6+9=15), carried the 1, then added the 10s (3+5+1=9), resulting in 95." But that actually only reflects common answers in its training data as to how the sum might be completed, as opposed to what it actually did.

    In other words, not only does the model use a very, very odd method to do the maths, you can't trust its explanations as to what it has just done. That's significant and shows that model outputs can not be relied upon when designing guardrails for AI. Their internal workings need to be understood, too.

    Another very surprising outcome of the research is the discovery that these LLMs do not, as is widely assumed, operate by merely predicting the next word. By tracing how Claude generated rhyming couplets, Anthropic found that it chose the rhyming word at the end of verses first, then filled in the rest of the line.

    "The planning thing in poems blew me away," says Batson. "Instead of at the very last minute trying to make the rhyme make sense, it knows where it’s going."

    Anthropic discovered that their Claude LLM didn't just predict the next word. (Image credit: Anthropic)

    Anthropic also found, among other things, that Claude "sometimes thinks in a conceptual space that is shared between languages, suggesting it has a kind of universal 'language of thought'."

    Anywho, there's apparently a long way to go with this research. According to Anthropic, "it currently takes a few hours of human effort to understand the circuits we see, even on prompts with only tens of words." And the research doesn't explain how the structures inside LLMs are formed in the first place.

    But it has shone a light on at least some parts of how these oddly mysterious AI beings—which we have created but don't understand—actually work. And that has to be a good thing.

    kami@lemmy.dbzer0.comK This user is from outside of this forum
    kami@lemmy.dbzer0.comK This user is from outside of this forum
    [email protected]
    wrote on last edited by
    #7

    Thanks for copypasting here. I wonder if the "prediction" is not as expected only in that case, when making rhymes. I also notice that its way of counting feels interestingly not too different from how I count when I need to come up fast with an approximate sum.

    pelespirit@sh.itjust.worksP 1 Reply Last reply
    0
    • cm0002@lemmy.worldC [email protected]
      This post did not contain any content.
      B This user is from outside of this forum
      B This user is from outside of this forum
      [email protected]
      wrote on last edited by
      #8

      The other day I asked an llm to create a partial number chart to help my son learn what numbers are next to each other. If I instructed it to do this using very detailed instructions it failed miserably every time. And sometimes when I even told it to correct specific things about its answer it still basically ignored me. The only way I could get it to do what I wanted consistently was to break the test down into small steps and tell it to show me its progress.

      I'd be very interested to learn it's "thought process" in each of those scenarios.

      L 1 Reply Last reply
      0
      • kami@lemmy.dbzer0.comK [email protected]

        Thanks for copypasting here. I wonder if the "prediction" is not as expected only in that case, when making rhymes. I also notice that its way of counting feels interestingly not too different from how I count when I need to come up fast with an approximate sum.

        pelespirit@sh.itjust.worksP This user is from outside of this forum
        pelespirit@sh.itjust.worksP This user is from outside of this forum
        [email protected]
        wrote on last edited by
        #9

        Isn't that the "new math" everyone was talking about?

        1 Reply Last reply
        0
        • cm0002@lemmy.worldC [email protected]

          That bit about how it turns out they aren't actually just predicting the next word is crazy and kinda blows the whole "It's just a fancy text auto-complete" argument out of the water IMO

          pelespirit@sh.itjust.worksP This user is from outside of this forum
          pelespirit@sh.itjust.worksP This user is from outside of this forum
          [email protected]
          wrote on last edited by
          #10

          I read an article that it can "think" in small chunks. They don't know how much though. This was also months ago, it's probably expanded by now.

          funnyusername@lemmy.worldF 1 Reply Last reply
          0
          • cm0002@lemmy.worldC [email protected]
            This post did not contain any content.
            G This user is from outside of this forum
            G This user is from outside of this forum
            [email protected]
            wrote on last edited by
            #11

            It's amazing that humans have coded a tool for which they have to afterwards write more tools for analyzing how it works.

            M 1 Reply Last reply
            0
            • ? Guest

              To understand what's actually happening, Anthropic's researchers developed a new technique, called circuit tracing, to track the decision-making processes inside a large language model step-by-step. They then applied it to their own Claude 3.5 Haiku LLM.

              Anthropic says its approach was inspired by the brain scanning techniques used in neuroscience and can identify components of the model that are active at different times. In other words, it's a little like a brain scanner spotting which parts of the brain are firing during a cognitive process.

              This is why LLMs are so patchy at math. (Image credit: Anthropic)

              Anthropic made lots of intriguing discoveries using this approach, not least of which is why LLMs are so terrible at basic mathematics. "Ask Claude to add 36 and 59 and the model will go through a series of odd steps, including first adding a selection of approximate values (add 40ish and 60ish, add 57ish and 36ish). Towards the end of its process, it comes up with the value 92ish. Meanwhile, another sequence of steps focuses on the last digits, 6 and 9, and determines that the answer must end in a 5. Putting that together with 92ish gives the correct answer of 95," the MIT article explains.

              But here's the really funky bit. If you ask Claude how it got the correct answer of 95, it will apparently tell you, "I added the ones (6+9=15), carried the 1, then added the 10s (3+5+1=9), resulting in 95." But that actually only reflects common answers in its training data as to how the sum might be completed, as opposed to what it actually did.

              In other words, not only does the model use a very, very odd method to do the maths, you can't trust its explanations as to what it has just done. That's significant and shows that model outputs can not be relied upon when designing guardrails for AI. Their internal workings need to be understood, too.

              Another very surprising outcome of the research is the discovery that these LLMs do not, as is widely assumed, operate by merely predicting the next word. By tracing how Claude generated rhyming couplets, Anthropic found that it chose the rhyming word at the end of verses first, then filled in the rest of the line.

              "The planning thing in poems blew me away," says Batson. "Instead of at the very last minute trying to make the rhyme make sense, it knows where it’s going."

              Anthropic discovered that their Claude LLM didn't just predict the next word. (Image credit: Anthropic)

              Anthropic also found, among other things, that Claude "sometimes thinks in a conceptual space that is shared between languages, suggesting it has a kind of universal 'language of thought'."

              Anywho, there's apparently a long way to go with this research. According to Anthropic, "it currently takes a few hours of human effort to understand the circuits we see, even on prompts with only tens of words." And the research doesn't explain how the structures inside LLMs are formed in the first place.

              But it has shone a light on at least some parts of how these oddly mysterious AI beings—which we have created but don't understand—actually work. And that has to be a good thing.

              mudman@fedia.ioM This user is from outside of this forum
              mudman@fedia.ioM This user is from outside of this forum
              [email protected]
              wrote on last edited by
              #12

              Is that a weird method of doing math?

              I mean, if you give me something borderline nontrivial like, say 72 times 13, I will definitely do some similar stuff. "Well it's more than 700 for sure, but it looks like less than a thousand. Three times seven is 21, so two hundred and ten, so it's probably in the 900s. Two times 13 is 26, so if you add that to the 910 it's probably 936, but I should check that in a calculator."

              Do you guys not do that? Is that a me thing?

              P 1 Reply Last reply
              0
              • cm0002@lemmy.worldC [email protected]

                That bit about how it turns out they aren't actually just predicting the next word is crazy and kinda blows the whole "It's just a fancy text auto-complete" argument out of the water IMO

                C This user is from outside of this forum
                C This user is from outside of this forum
                [email protected]
                wrote on last edited by
                #13

                Predicting the next word vs predicting a word in the middle and then predicting backwards are not hugely different things. It's still predicting parts of the passage based solely on other parts of the passage.

                Compared to a human who forms an abstract thought and then translates that thought into words. Which words I use has little to do with which other words I've used except to make sure I'm following the rules of grammar.

                W 1 Reply Last reply
                0
                • mudman@fedia.ioM [email protected]

                  Is that a weird method of doing math?

                  I mean, if you give me something borderline nontrivial like, say 72 times 13, I will definitely do some similar stuff. "Well it's more than 700 for sure, but it looks like less than a thousand. Three times seven is 21, so two hundred and ten, so it's probably in the 900s. Two times 13 is 26, so if you add that to the 910 it's probably 936, but I should check that in a calculator."

                  Do you guys not do that? Is that a me thing?

                  P This user is from outside of this forum
                  P This user is from outside of this forum
                  [email protected]
                  wrote on last edited by
                  #14

                  Nah I do similar stuff. I think very few people actually trace their own lines of thought, so they probably don’t realize this is how it often works.

                  F 1 Reply Last reply
                  0
                  • cm0002@lemmy.worldC [email protected]
                    This post did not contain any content.
                    P This user is from outside of this forum
                    P This user is from outside of this forum
                    [email protected]
                    wrote on last edited by
                    #15

                    This is great stuff. If we can properly understand these “flows” of intelligence, we might be able to write optimized shortcuts for them, vastly improving performance.

                    L 1 Reply Last reply
                    0
                    • ? Guest

                      To understand what's actually happening, Anthropic's researchers developed a new technique, called circuit tracing, to track the decision-making processes inside a large language model step-by-step. They then applied it to their own Claude 3.5 Haiku LLM.

                      Anthropic says its approach was inspired by the brain scanning techniques used in neuroscience and can identify components of the model that are active at different times. In other words, it's a little like a brain scanner spotting which parts of the brain are firing during a cognitive process.

                      This is why LLMs are so patchy at math. (Image credit: Anthropic)

                      Anthropic made lots of intriguing discoveries using this approach, not least of which is why LLMs are so terrible at basic mathematics. "Ask Claude to add 36 and 59 and the model will go through a series of odd steps, including first adding a selection of approximate values (add 40ish and 60ish, add 57ish and 36ish). Towards the end of its process, it comes up with the value 92ish. Meanwhile, another sequence of steps focuses on the last digits, 6 and 9, and determines that the answer must end in a 5. Putting that together with 92ish gives the correct answer of 95," the MIT article explains.

                      But here's the really funky bit. If you ask Claude how it got the correct answer of 95, it will apparently tell you, "I added the ones (6+9=15), carried the 1, then added the 10s (3+5+1=9), resulting in 95." But that actually only reflects common answers in its training data as to how the sum might be completed, as opposed to what it actually did.

                      In other words, not only does the model use a very, very odd method to do the maths, you can't trust its explanations as to what it has just done. That's significant and shows that model outputs can not be relied upon when designing guardrails for AI. Their internal workings need to be understood, too.

                      Another very surprising outcome of the research is the discovery that these LLMs do not, as is widely assumed, operate by merely predicting the next word. By tracing how Claude generated rhyming couplets, Anthropic found that it chose the rhyming word at the end of verses first, then filled in the rest of the line.

                      "The planning thing in poems blew me away," says Batson. "Instead of at the very last minute trying to make the rhyme make sense, it knows where it’s going."

                      Anthropic discovered that their Claude LLM didn't just predict the next word. (Image credit: Anthropic)

                      Anthropic also found, among other things, that Claude "sometimes thinks in a conceptual space that is shared between languages, suggesting it has a kind of universal 'language of thought'."

                      Anywho, there's apparently a long way to go with this research. According to Anthropic, "it currently takes a few hours of human effort to understand the circuits we see, even on prompts with only tens of words." And the research doesn't explain how the structures inside LLMs are formed in the first place.

                      But it has shone a light on at least some parts of how these oddly mysterious AI beings—which we have created but don't understand—actually work. And that has to be a good thing.

                      N This user is from outside of this forum
                      N This user is from outside of this forum
                      [email protected]
                      wrote on last edited by
                      #16

                      This reminds me of learning a shortcut in math class but also knowing that the lesson didn't cover that particular method. So, I use the shortcut to get the answer on a multiple choice question, but I use method from the lesson when asked to show my work. (e.g. Pascal's Pyramid vs Binomial Expansion).

                      It might not seem like a shortcut for us, but something about this LLM's training makes it easier to use heuristics. That's actually a pretty big deal for a machine to choose fuzzy logic over algorithms when it knows that the teacher wants it to use the algorithm.

                      mudman@fedia.ioM 1 Reply Last reply
                      0
                      • cm0002@lemmy.worldC [email protected]
                        This post did not contain any content.
                        M This user is from outside of this forum
                        M This user is from outside of this forum
                        [email protected]
                        wrote on last edited by
                        #17

                        The math example in particular is very interesting, and makes me wonder if we could splice a calculator into the model, basically doing "brain surgery" to short circuit the learned arithmetic process and replace it.

                        N 1 Reply Last reply
                        0
                        • simple@lemm.eeS [email protected]

                          Rather than read PCGamer talk about Anthropic's article you can just read it directly here. It's a good read.

                          cm0002@lemmy.worldC This user is from outside of this forum
                          cm0002@lemmy.worldC This user is from outside of this forum
                          [email protected]
                          wrote on last edited by
                          #18

                          I think this comm is more suited for news articles talking about it, though I did post that link to [email protected] which I think would be a more suited comm for those who want to go more in-depth on it

                          1 Reply Last reply
                          0
                          • P [email protected]

                            Nah I do similar stuff. I think very few people actually trace their own lines of thought, so they probably don’t realize this is how it often works.

                            F This user is from outside of this forum
                            F This user is from outside of this forum
                            [email protected]
                            wrote on last edited by
                            #19

                            Huh. I visualize a whiteboard in my head. Then I...do the math.

                            I'm also fairly certain I'm autistic, so... ¯\_(ツ)_/¯

                            1 Reply Last reply
                            0
                            • S This user is from outside of this forum
                              S This user is from outside of this forum
                              [email protected]
                              wrote on last edited by
                              #20

                              I do much the same in my head.

                              Know what's crazy? We sling bags of mulch, dirt and rocks onto customer vehicles every day. No one, neither coworkers nor customers, will do simple multiplication. Only the most advanced workers do it. No lie.

                              Customer wants 30 bags of mulch. I look at the given space:

                              "Let's do 6 stacks of 5."

                              Everyone proceeds to sling shit around in random piles and count as we go. And then someone loses track and has to shift shit around to check the count.

                              F 1 Reply Last reply
                              0
                              • T This user is from outside of this forum
                                T This user is from outside of this forum
                                [email protected]
                                wrote on last edited by
                                #21

                                Well, I guess I do a bit of the same:) I do (70+2)(10+3) -> 700+210+20+6

                                1 Reply Last reply
                                0
                                • M [email protected]

                                  The math example in particular is very interesting, and makes me wonder if we could splice a calculator into the model, basically doing "brain surgery" to short circuit the learned arithmetic process and replace it.

                                  N This user is from outside of this forum
                                  N This user is from outside of this forum
                                  [email protected]
                                  wrote on last edited by
                                  #22

                                  That math process for adding the two numbers - there's nothing wrong with it at all. Estimate the total and come up with a range. Determine exactly what the last digit is. In the example, there's only one number in the range with 5 as the last digit. That must be the answer. Hell, I might even use that same method in my own head.

                                  The poetry example, people use that one often enough, too. Come up with a couple of words you would have fun rhyming, and build the lines around those words. Nothing wrong with that, either.

                                  These two processes are closer to "thought" than I previously imagined.

                                  M 1 Reply Last reply
                                  0
                                  • cm0002@lemmy.worldC [email protected]

                                    That bit about how it turns out they aren't actually just predicting the next word is crazy and kinda blows the whole "It's just a fancy text auto-complete" argument out of the water IMO

                                    V This user is from outside of this forum
                                    V This user is from outside of this forum
                                    [email protected]
                                    wrote on last edited by
                                    #23

                                    It really doesn't. You're just describing the "fancy" part of "fancy autocomplete." No one was ever really suggesting that they only predict the next word. If that was the case they would just be autocomplete, nothing fancy about it.

                                    What's being conveyed by "fancy autocomplete" is that these models ultimately operate by combining the most statistically likely elements of their dataset, with some application of random noise. More noise creates more "creative" (meaning more random, less probable) outputs. They do not actually "think" as we understand thought. This can clearly be seen in the examples given in the article, especially to do with math. The model is throwing together elements that are statistically proximate to the prompt. It's not actually applying a structured, logical method the way humans can be taught to.

                                    R F A 3 Replies Last reply
                                    0
                                    • C [email protected]

                                      Predicting the next word vs predicting a word in the middle and then predicting backwards are not hugely different things. It's still predicting parts of the passage based solely on other parts of the passage.

                                      Compared to a human who forms an abstract thought and then translates that thought into words. Which words I use has little to do with which other words I've used except to make sure I'm following the rules of grammar.

                                      W This user is from outside of this forum
                                      W This user is from outside of this forum
                                      [email protected]
                                      wrote on last edited by
                                      #24

                                      Compared to a human who forms an abstract thought and then translates that thought into words. Which words I use has little to do with which other words I’ve used except to make sure I’m following the rules of grammar.

                                      Interesting that...

                                      Anthropic also found, among other things, that Claude "sometimes thinks in a conceptual space that is shared between languages, suggesting it has a kind of universal 'language of thought'."

                                      M C T 3 Replies Last reply
                                      0
                                      • pelespirit@sh.itjust.worksP [email protected]

                                        I read an article that it can "think" in small chunks. They don't know how much though. This was also months ago, it's probably expanded by now.

                                        funnyusername@lemmy.worldF This user is from outside of this forum
                                        funnyusername@lemmy.worldF This user is from outside of this forum
                                        [email protected]
                                        wrote on last edited by
                                        #25

                                        anything that claims it "thinks" in any way I immediately dismiss as an advertisement of some sort. these models are doing very interesting things, but it is in no way "thinking" as a sentient mind does.

                                        pelespirit@sh.itjust.worksP L S 3 Replies Last reply
                                        0
                                        • funnyusername@lemmy.worldF [email protected]

                                          anything that claims it "thinks" in any way I immediately dismiss as an advertisement of some sort. these models are doing very interesting things, but it is in no way "thinking" as a sentient mind does.

                                          pelespirit@sh.itjust.worksP This user is from outside of this forum
                                          pelespirit@sh.itjust.worksP This user is from outside of this forum
                                          [email protected]
                                          wrote on last edited by
                                          #26

                                          I wish I could find the article. It was researchers and they were freaked out just as much as anyone else. It's like slightly over chance that it "thought," not some huge revolutionary leap.

                                          funnyusername@lemmy.worldF 1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups