Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

agnos.is Forums

  1. Home
  2. Technology
  3. Anthropic has developed an AI 'brain scanner' to understand how LLMs work and it turns out the reason why chatbots are terrible at simple math and hallucinate is weirder than you thought

Anthropic has developed an AI 'brain scanner' to understand how LLMs work and it turns out the reason why chatbots are terrible at simple math and hallucinate is weirder than you thought

Scheduled Pinned Locked Moved Technology
technology
163 Posts 97 Posters 658 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • cm0002@lemmy.worldC [email protected]
    This post did not contain any content.
    ? Offline
    ? Offline
    Guest
    wrote on last edited by
    #2

    To understand what's actually happening, Anthropic's researchers developed a new technique, called circuit tracing, to track the decision-making processes inside a large language model step-by-step. They then applied it to their own Claude 3.5 Haiku LLM.

    Anthropic says its approach was inspired by the brain scanning techniques used in neuroscience and can identify components of the model that are active at different times. In other words, it's a little like a brain scanner spotting which parts of the brain are firing during a cognitive process.

    This is why LLMs are so patchy at math. (Image credit: Anthropic)

    Anthropic made lots of intriguing discoveries using this approach, not least of which is why LLMs are so terrible at basic mathematics. "Ask Claude to add 36 and 59 and the model will go through a series of odd steps, including first adding a selection of approximate values (add 40ish and 60ish, add 57ish and 36ish). Towards the end of its process, it comes up with the value 92ish. Meanwhile, another sequence of steps focuses on the last digits, 6 and 9, and determines that the answer must end in a 5. Putting that together with 92ish gives the correct answer of 95," the MIT article explains.

    But here's the really funky bit. If you ask Claude how it got the correct answer of 95, it will apparently tell you, "I added the ones (6+9=15), carried the 1, then added the 10s (3+5+1=9), resulting in 95." But that actually only reflects common answers in its training data as to how the sum might be completed, as opposed to what it actually did.

    In other words, not only does the model use a very, very odd method to do the maths, you can't trust its explanations as to what it has just done. That's significant and shows that model outputs can not be relied upon when designing guardrails for AI. Their internal workings need to be understood, too.

    Another very surprising outcome of the research is the discovery that these LLMs do not, as is widely assumed, operate by merely predicting the next word. By tracing how Claude generated rhyming couplets, Anthropic found that it chose the rhyming word at the end of verses first, then filled in the rest of the line.

    "The planning thing in poems blew me away," says Batson. "Instead of at the very last minute trying to make the rhyme make sense, it knows where it’s going."

    Anthropic discovered that their Claude LLM didn't just predict the next word. (Image credit: Anthropic)

    Anthropic also found, among other things, that Claude "sometimes thinks in a conceptual space that is shared between languages, suggesting it has a kind of universal 'language of thought'."

    Anywho, there's apparently a long way to go with this research. According to Anthropic, "it currently takes a few hours of human effort to understand the circuits we see, even on prompts with only tens of words." And the research doesn't explain how the structures inside LLMs are formed in the first place.

    But it has shone a light on at least some parts of how these oddly mysterious AI beings—which we have created but don't understand—actually work. And that has to be a good thing.

    kami@lemmy.dbzer0.comK mudman@fedia.ioM N M H 7 Replies Last reply
    0
    • cm0002@lemmy.worldC [email protected]
      This post did not contain any content.
      funnyusername@lemmy.worldF This user is from outside of this forum
      funnyusername@lemmy.worldF This user is from outside of this forum
      [email protected]
      wrote on last edited by
      #3

      this is one of the most interesting things about Llms that i have ever read

      cm0002@lemmy.worldC 1 Reply Last reply
      0
      • funnyusername@lemmy.worldF [email protected]

        this is one of the most interesting things about Llms that i have ever read

        cm0002@lemmy.worldC This user is from outside of this forum
        cm0002@lemmy.worldC This user is from outside of this forum
        [email protected]
        wrote on last edited by
        #4

        That bit about how it turns out they aren't actually just predicting the next word is crazy and kinda blows the whole "It's just a fancy text auto-complete" argument out of the water IMO

        pelespirit@sh.itjust.worksP C V S L 5 Replies Last reply
        0
        • cm0002@lemmy.worldC [email protected]
          This post did not contain any content.
          bell@lemmy.worldB This user is from outside of this forum
          bell@lemmy.worldB This user is from outside of this forum
          [email protected]
          wrote on last edited by
          #5

          How can i take an article that uses the word "anywho" seriously?

          1 Reply Last reply
          0
          • cm0002@lemmy.worldC [email protected]
            This post did not contain any content.
            simple@lemm.eeS This user is from outside of this forum
            simple@lemm.eeS This user is from outside of this forum
            [email protected]
            wrote on last edited by
            #6

            Rather than read PCGamer talk about Anthropic's article you can just read it directly here. It's a good read.

            cm0002@lemmy.worldC 1 Reply Last reply
            0
            • ? Guest

              To understand what's actually happening, Anthropic's researchers developed a new technique, called circuit tracing, to track the decision-making processes inside a large language model step-by-step. They then applied it to their own Claude 3.5 Haiku LLM.

              Anthropic says its approach was inspired by the brain scanning techniques used in neuroscience and can identify components of the model that are active at different times. In other words, it's a little like a brain scanner spotting which parts of the brain are firing during a cognitive process.

              This is why LLMs are so patchy at math. (Image credit: Anthropic)

              Anthropic made lots of intriguing discoveries using this approach, not least of which is why LLMs are so terrible at basic mathematics. "Ask Claude to add 36 and 59 and the model will go through a series of odd steps, including first adding a selection of approximate values (add 40ish and 60ish, add 57ish and 36ish). Towards the end of its process, it comes up with the value 92ish. Meanwhile, another sequence of steps focuses on the last digits, 6 and 9, and determines that the answer must end in a 5. Putting that together with 92ish gives the correct answer of 95," the MIT article explains.

              But here's the really funky bit. If you ask Claude how it got the correct answer of 95, it will apparently tell you, "I added the ones (6+9=15), carried the 1, then added the 10s (3+5+1=9), resulting in 95." But that actually only reflects common answers in its training data as to how the sum might be completed, as opposed to what it actually did.

              In other words, not only does the model use a very, very odd method to do the maths, you can't trust its explanations as to what it has just done. That's significant and shows that model outputs can not be relied upon when designing guardrails for AI. Their internal workings need to be understood, too.

              Another very surprising outcome of the research is the discovery that these LLMs do not, as is widely assumed, operate by merely predicting the next word. By tracing how Claude generated rhyming couplets, Anthropic found that it chose the rhyming word at the end of verses first, then filled in the rest of the line.

              "The planning thing in poems blew me away," says Batson. "Instead of at the very last minute trying to make the rhyme make sense, it knows where it’s going."

              Anthropic discovered that their Claude LLM didn't just predict the next word. (Image credit: Anthropic)

              Anthropic also found, among other things, that Claude "sometimes thinks in a conceptual space that is shared between languages, suggesting it has a kind of universal 'language of thought'."

              Anywho, there's apparently a long way to go with this research. According to Anthropic, "it currently takes a few hours of human effort to understand the circuits we see, even on prompts with only tens of words." And the research doesn't explain how the structures inside LLMs are formed in the first place.

              But it has shone a light on at least some parts of how these oddly mysterious AI beings—which we have created but don't understand—actually work. And that has to be a good thing.

              kami@lemmy.dbzer0.comK This user is from outside of this forum
              kami@lemmy.dbzer0.comK This user is from outside of this forum
              [email protected]
              wrote on last edited by
              #7

              Thanks for copypasting here. I wonder if the "prediction" is not as expected only in that case, when making rhymes. I also notice that its way of counting feels interestingly not too different from how I count when I need to come up fast with an approximate sum.

              pelespirit@sh.itjust.worksP 1 Reply Last reply
              0
              • cm0002@lemmy.worldC [email protected]
                This post did not contain any content.
                B This user is from outside of this forum
                B This user is from outside of this forum
                [email protected]
                wrote on last edited by
                #8

                The other day I asked an llm to create a partial number chart to help my son learn what numbers are next to each other. If I instructed it to do this using very detailed instructions it failed miserably every time. And sometimes when I even told it to correct specific things about its answer it still basically ignored me. The only way I could get it to do what I wanted consistently was to break the test down into small steps and tell it to show me its progress.

                I'd be very interested to learn it's "thought process" in each of those scenarios.

                L 1 Reply Last reply
                0
                • kami@lemmy.dbzer0.comK [email protected]

                  Thanks for copypasting here. I wonder if the "prediction" is not as expected only in that case, when making rhymes. I also notice that its way of counting feels interestingly not too different from how I count when I need to come up fast with an approximate sum.

                  pelespirit@sh.itjust.worksP This user is from outside of this forum
                  pelespirit@sh.itjust.worksP This user is from outside of this forum
                  [email protected]
                  wrote on last edited by
                  #9

                  Isn't that the "new math" everyone was talking about?

                  1 Reply Last reply
                  0
                  • cm0002@lemmy.worldC [email protected]

                    That bit about how it turns out they aren't actually just predicting the next word is crazy and kinda blows the whole "It's just a fancy text auto-complete" argument out of the water IMO

                    pelespirit@sh.itjust.worksP This user is from outside of this forum
                    pelespirit@sh.itjust.worksP This user is from outside of this forum
                    [email protected]
                    wrote on last edited by
                    #10

                    I read an article that it can "think" in small chunks. They don't know how much though. This was also months ago, it's probably expanded by now.

                    funnyusername@lemmy.worldF 1 Reply Last reply
                    0
                    • cm0002@lemmy.worldC [email protected]
                      This post did not contain any content.
                      G This user is from outside of this forum
                      G This user is from outside of this forum
                      [email protected]
                      wrote on last edited by
                      #11

                      It's amazing that humans have coded a tool for which they have to afterwards write more tools for analyzing how it works.

                      M 1 Reply Last reply
                      0
                      • ? Guest

                        To understand what's actually happening, Anthropic's researchers developed a new technique, called circuit tracing, to track the decision-making processes inside a large language model step-by-step. They then applied it to their own Claude 3.5 Haiku LLM.

                        Anthropic says its approach was inspired by the brain scanning techniques used in neuroscience and can identify components of the model that are active at different times. In other words, it's a little like a brain scanner spotting which parts of the brain are firing during a cognitive process.

                        This is why LLMs are so patchy at math. (Image credit: Anthropic)

                        Anthropic made lots of intriguing discoveries using this approach, not least of which is why LLMs are so terrible at basic mathematics. "Ask Claude to add 36 and 59 and the model will go through a series of odd steps, including first adding a selection of approximate values (add 40ish and 60ish, add 57ish and 36ish). Towards the end of its process, it comes up with the value 92ish. Meanwhile, another sequence of steps focuses on the last digits, 6 and 9, and determines that the answer must end in a 5. Putting that together with 92ish gives the correct answer of 95," the MIT article explains.

                        But here's the really funky bit. If you ask Claude how it got the correct answer of 95, it will apparently tell you, "I added the ones (6+9=15), carried the 1, then added the 10s (3+5+1=9), resulting in 95." But that actually only reflects common answers in its training data as to how the sum might be completed, as opposed to what it actually did.

                        In other words, not only does the model use a very, very odd method to do the maths, you can't trust its explanations as to what it has just done. That's significant and shows that model outputs can not be relied upon when designing guardrails for AI. Their internal workings need to be understood, too.

                        Another very surprising outcome of the research is the discovery that these LLMs do not, as is widely assumed, operate by merely predicting the next word. By tracing how Claude generated rhyming couplets, Anthropic found that it chose the rhyming word at the end of verses first, then filled in the rest of the line.

                        "The planning thing in poems blew me away," says Batson. "Instead of at the very last minute trying to make the rhyme make sense, it knows where it’s going."

                        Anthropic discovered that their Claude LLM didn't just predict the next word. (Image credit: Anthropic)

                        Anthropic also found, among other things, that Claude "sometimes thinks in a conceptual space that is shared between languages, suggesting it has a kind of universal 'language of thought'."

                        Anywho, there's apparently a long way to go with this research. According to Anthropic, "it currently takes a few hours of human effort to understand the circuits we see, even on prompts with only tens of words." And the research doesn't explain how the structures inside LLMs are formed in the first place.

                        But it has shone a light on at least some parts of how these oddly mysterious AI beings—which we have created but don't understand—actually work. And that has to be a good thing.

                        mudman@fedia.ioM This user is from outside of this forum
                        mudman@fedia.ioM This user is from outside of this forum
                        [email protected]
                        wrote on last edited by
                        #12

                        Is that a weird method of doing math?

                        I mean, if you give me something borderline nontrivial like, say 72 times 13, I will definitely do some similar stuff. "Well it's more than 700 for sure, but it looks like less than a thousand. Three times seven is 21, so two hundred and ten, so it's probably in the 900s. Two times 13 is 26, so if you add that to the 910 it's probably 936, but I should check that in a calculator."

                        Do you guys not do that? Is that a me thing?

                        P 1 Reply Last reply
                        0
                        • cm0002@lemmy.worldC [email protected]

                          That bit about how it turns out they aren't actually just predicting the next word is crazy and kinda blows the whole "It's just a fancy text auto-complete" argument out of the water IMO

                          C This user is from outside of this forum
                          C This user is from outside of this forum
                          [email protected]
                          wrote on last edited by
                          #13

                          Predicting the next word vs predicting a word in the middle and then predicting backwards are not hugely different things. It's still predicting parts of the passage based solely on other parts of the passage.

                          Compared to a human who forms an abstract thought and then translates that thought into words. Which words I use has little to do with which other words I've used except to make sure I'm following the rules of grammar.

                          W 1 Reply Last reply
                          0
                          • mudman@fedia.ioM [email protected]

                            Is that a weird method of doing math?

                            I mean, if you give me something borderline nontrivial like, say 72 times 13, I will definitely do some similar stuff. "Well it's more than 700 for sure, but it looks like less than a thousand. Three times seven is 21, so two hundred and ten, so it's probably in the 900s. Two times 13 is 26, so if you add that to the 910 it's probably 936, but I should check that in a calculator."

                            Do you guys not do that? Is that a me thing?

                            P This user is from outside of this forum
                            P This user is from outside of this forum
                            [email protected]
                            wrote on last edited by
                            #14

                            Nah I do similar stuff. I think very few people actually trace their own lines of thought, so they probably don’t realize this is how it often works.

                            F 1 Reply Last reply
                            0
                            • cm0002@lemmy.worldC [email protected]
                              This post did not contain any content.
                              P This user is from outside of this forum
                              P This user is from outside of this forum
                              [email protected]
                              wrote on last edited by
                              #15

                              This is great stuff. If we can properly understand these “flows” of intelligence, we might be able to write optimized shortcuts for them, vastly improving performance.

                              L 1 Reply Last reply
                              0
                              • ? Guest

                                To understand what's actually happening, Anthropic's researchers developed a new technique, called circuit tracing, to track the decision-making processes inside a large language model step-by-step. They then applied it to their own Claude 3.5 Haiku LLM.

                                Anthropic says its approach was inspired by the brain scanning techniques used in neuroscience and can identify components of the model that are active at different times. In other words, it's a little like a brain scanner spotting which parts of the brain are firing during a cognitive process.

                                This is why LLMs are so patchy at math. (Image credit: Anthropic)

                                Anthropic made lots of intriguing discoveries using this approach, not least of which is why LLMs are so terrible at basic mathematics. "Ask Claude to add 36 and 59 and the model will go through a series of odd steps, including first adding a selection of approximate values (add 40ish and 60ish, add 57ish and 36ish). Towards the end of its process, it comes up with the value 92ish. Meanwhile, another sequence of steps focuses on the last digits, 6 and 9, and determines that the answer must end in a 5. Putting that together with 92ish gives the correct answer of 95," the MIT article explains.

                                But here's the really funky bit. If you ask Claude how it got the correct answer of 95, it will apparently tell you, "I added the ones (6+9=15), carried the 1, then added the 10s (3+5+1=9), resulting in 95." But that actually only reflects common answers in its training data as to how the sum might be completed, as opposed to what it actually did.

                                In other words, not only does the model use a very, very odd method to do the maths, you can't trust its explanations as to what it has just done. That's significant and shows that model outputs can not be relied upon when designing guardrails for AI. Their internal workings need to be understood, too.

                                Another very surprising outcome of the research is the discovery that these LLMs do not, as is widely assumed, operate by merely predicting the next word. By tracing how Claude generated rhyming couplets, Anthropic found that it chose the rhyming word at the end of verses first, then filled in the rest of the line.

                                "The planning thing in poems blew me away," says Batson. "Instead of at the very last minute trying to make the rhyme make sense, it knows where it’s going."

                                Anthropic discovered that their Claude LLM didn't just predict the next word. (Image credit: Anthropic)

                                Anthropic also found, among other things, that Claude "sometimes thinks in a conceptual space that is shared between languages, suggesting it has a kind of universal 'language of thought'."

                                Anywho, there's apparently a long way to go with this research. According to Anthropic, "it currently takes a few hours of human effort to understand the circuits we see, even on prompts with only tens of words." And the research doesn't explain how the structures inside LLMs are formed in the first place.

                                But it has shone a light on at least some parts of how these oddly mysterious AI beings—which we have created but don't understand—actually work. And that has to be a good thing.

                                N This user is from outside of this forum
                                N This user is from outside of this forum
                                [email protected]
                                wrote on last edited by
                                #16

                                This reminds me of learning a shortcut in math class but also knowing that the lesson didn't cover that particular method. So, I use the shortcut to get the answer on a multiple choice question, but I use method from the lesson when asked to show my work. (e.g. Pascal's Pyramid vs Binomial Expansion).

                                It might not seem like a shortcut for us, but something about this LLM's training makes it easier to use heuristics. That's actually a pretty big deal for a machine to choose fuzzy logic over algorithms when it knows that the teacher wants it to use the algorithm.

                                mudman@fedia.ioM 1 Reply Last reply
                                0
                                • cm0002@lemmy.worldC [email protected]
                                  This post did not contain any content.
                                  M This user is from outside of this forum
                                  M This user is from outside of this forum
                                  [email protected]
                                  wrote on last edited by
                                  #17

                                  The math example in particular is very interesting, and makes me wonder if we could splice a calculator into the model, basically doing "brain surgery" to short circuit the learned arithmetic process and replace it.

                                  N 1 Reply Last reply
                                  0
                                  • simple@lemm.eeS [email protected]

                                    Rather than read PCGamer talk about Anthropic's article you can just read it directly here. It's a good read.

                                    cm0002@lemmy.worldC This user is from outside of this forum
                                    cm0002@lemmy.worldC This user is from outside of this forum
                                    [email protected]
                                    wrote on last edited by
                                    #18

                                    I think this comm is more suited for news articles talking about it, though I did post that link to [email protected] which I think would be a more suited comm for those who want to go more in-depth on it

                                    1 Reply Last reply
                                    0
                                    • P [email protected]

                                      Nah I do similar stuff. I think very few people actually trace their own lines of thought, so they probably don’t realize this is how it often works.

                                      F This user is from outside of this forum
                                      F This user is from outside of this forum
                                      [email protected]
                                      wrote on last edited by
                                      #19

                                      Huh. I visualize a whiteboard in my head. Then I...do the math.

                                      I'm also fairly certain I'm autistic, so... ¯\_(ツ)_/¯

                                      1 Reply Last reply
                                      0
                                      • S This user is from outside of this forum
                                        S This user is from outside of this forum
                                        [email protected]
                                        wrote on last edited by
                                        #20

                                        I do much the same in my head.

                                        Know what's crazy? We sling bags of mulch, dirt and rocks onto customer vehicles every day. No one, neither coworkers nor customers, will do simple multiplication. Only the most advanced workers do it. No lie.

                                        Customer wants 30 bags of mulch. I look at the given space:

                                        "Let's do 6 stacks of 5."

                                        Everyone proceeds to sling shit around in random piles and count as we go. And then someone loses track and has to shift shit around to check the count.

                                        F 1 Reply Last reply
                                        0
                                        • T This user is from outside of this forum
                                          T This user is from outside of this forum
                                          [email protected]
                                          wrote on last edited by
                                          #21

                                          Well, I guess I do a bit of the same:) I do (70+2)(10+3) -> 700+210+20+6

                                          1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups