Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

agnos.is Forums

  1. Home
  2. Ask Lemmy
  3. Can we trust LLM CALCULATIONS?.

Can we trust LLM CALCULATIONS?.

Scheduled Pinned Locked Moved Ask Lemmy
asklemmy
69 Posts 48 Posters 3 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • S [email protected]

    short answer: no.

    Long Answer: They are still (mostly) statisics based and can't do real math. You can use the answers from LLMs as starting point, but you have to rigerously verify the answers they give.

    U This user is from outside of this forum
    U This user is from outside of this forum
    [email protected]
    wrote last edited by
    #20

    The whole "two r's in strawberry" thing is enough of an argument for me. If things like that happen at such a low level, its completely impossible that it wont make mistakes with problems that are exponentially more complicated than that.

    O 1 Reply Last reply
    19
    • F [email protected]

      Ok, you have a moderately complex math problem you needed to solve. You gave the problem to 6 LLMS all paid versions. All 6 get the same numbers. Would you trust the answer?

      G This user is from outside of this forum
      G This user is from outside of this forum
      [email protected]
      wrote last edited by
      #21

      Probably, depending on the context. It is possible that all 6 models were trained on the same misleading data, but not very likely in general.

      Number crunching isn't an obvious LLM use case, though. Depending on the task, having it create code to crunch the numbers, or a step-by-step tutorial on how to derive the formula, would be my preference.

      1 Reply Last reply
      0
      • F [email protected]

        Ok, you have a moderately complex math problem you needed to solve. You gave the problem to 6 LLMS all paid versions. All 6 get the same numbers. Would you trust the answer?

        S This user is from outside of this forum
        S This user is from outside of this forum
        [email protected]
        wrote last edited by
        #22

        If you really want to see how good they are, have them do the full calculation for 30! (30 factorial) And see how close it gets to real numbers.

        1 Reply Last reply
        0
        • F [email protected]

          Ok, you have a moderately complex math problem you needed to solve. You gave the problem to 6 LLMS all paid versions. All 6 get the same numbers. Would you trust the answer?

          aatube@kbin.melroy.orgA This user is from outside of this forum
          aatube@kbin.melroy.orgA This user is from outside of this forum
          [email protected]
          wrote last edited by [email protected]
          #23

          this is a really weird premise. doing the same thing on 6 models is just not worth it especially when wolfram alpha exists and is far more trustable and speedy

          1 Reply Last reply
          1
          • F [email protected]

            Ok, you have a moderately complex math problem you needed to solve. You gave the problem to 6 LLMS all paid versions. All 6 get the same numbers. Would you trust the answer?

            M This user is from outside of this forum
            M This user is from outside of this forum
            [email protected]
            wrote last edited by [email protected]
            #24

            LLMs don't and can't do math. They don't calculate anything, that's just not how they work. Instead, they do this:

            2 + 2 = ? What comes after that? Oh, I remember! It's '4'!

            It could be right, it could be wrong. If there's enough pattern in the training data, it could remember the correct answer. Otherwise it'll just place a plausible looking value there (behavior known as AI hallucination). So, you can not "trust" it.

            M greg@lemmy.caG N zos_kia@lemmynsfw.comZ 4 Replies Last reply
            6
            • O [email protected]

              no, once i tried to do binary calc with chat gpt and he keot giving me wrong answers. good thing i had sone unit tests around that part so realised quickly its lying

              pika@sh.itjust.worksP This user is from outside of this forum
              pika@sh.itjust.worksP This user is from outside of this forum
              [email protected]
              wrote last edited by
              #25

              Just yesterday I was fiddling around with a logic test in python. I wanted to see how well deepseek could analyze the intro line to a for loop, it properly identified what it did in the description, but when it moved onto giving examples it contradicted itself and took 3 or 4 replies before it realized that it contradicted itself.

              1 Reply Last reply
              0
              • F [email protected]

                It checked out. But, all six getting the same is likely incorrect?.

                P This user is from outside of this forum
                P This user is from outside of this forum
                [email protected]
                wrote last edited by [email protected]
                #26

                Yes. All six are likely to be incorrect.

                Similarly, you could ask a subtle quantum mechanics question to six psychologists, and all six may well give you the same answer. You still should not trust that answer.

                The way that LLMs correlate and gather answers is particularly unsuited to mathematics.

                Edit: I. Contrast, the average Psychologist is much more prepared to answer a quantum mechanics question, than an average LLM is to answer a math or counting question.

                1 Reply Last reply
                2
                • O [email protected]

                  no, once i tried to do binary calc with chat gpt and he keot giving me wrong answers. good thing i had sone unit tests around that part so realised quickly its lying

                  dan1101@lemmy.worldD This user is from outside of this forum
                  dan1101@lemmy.worldD This user is from outside of this forum
                  [email protected]
                  wrote last edited by
                  #27

                  Yes more people need to realize it's just a search engine with natural language input and output. LLM output should at least include citations.

                  1 Reply Last reply
                  0
                  • F [email protected]

                    But, if you ran, gave the problem to all the top models and got the same? Is it still likely an incorrect answer? I checked 6. I checked a bunch of times. Different accounts. I was testing it. I'm seeing if its possible with all that in others opinions I actually had to check over a hundred times each got the same numbers.

                    P This user is from outside of this forum
                    P This user is from outside of this forum
                    [email protected]
                    wrote last edited by
                    #28

                    What if there is a popular joke that relies on bad math that happens to be your question. Then the alignment is understandable and no indication of accuracy. Why use a tool with known issues, and overhead like querying six, instead of using a decent tool like Wolfram alpha?

                    F 1 Reply Last reply
                    0
                    • F [email protected]

                      Ok, you have a moderately complex math problem you needed to solve. You gave the problem to 6 LLMS all paid versions. All 6 get the same numbers. Would you trust the answer?

                      B This user is from outside of this forum
                      B This user is from outside of this forum
                      [email protected]
                      wrote last edited by [email protected]
                      #29

                      You cannot trust LLMs. Period.

                      They are literally hallucination machines that just happen to be correct sometimes.

                      rivalarrival@lemmy.todayR 1 Reply Last reply
                      1
                      • F [email protected]

                        Ok, you have a moderately complex math problem you needed to solve. You gave the problem to 6 LLMS all paid versions. All 6 get the same numbers. Would you trust the answer?

                        rhaedas@fedia.ioR This user is from outside of this forum
                        rhaedas@fedia.ioR This user is from outside of this forum
                        [email protected]
                        wrote last edited by
                        #30

                        How trustable the answer is depends on knowing where the answers come from, which is unknowable. If the probability of the answers being generated from the original problem are high because it occurred in many different places in the training data, then maybe it's correct. Or maybe everyone who came up with the answer is wrong in the same way and that's why there is so much correlation. Or perhaps the probability match is simply because lots of math problems tend towards similar answers.

                        The core issue is that the LLM is not thinking or reasoning about the problem itself, so trusting it with anything is more assuming the likelihood of it being right more than wrong is high. In some areas this is safe to do, in others it's a terrible assumption to make.

                        1 Reply Last reply
                        1
                        • F [email protected]

                          But, if you ran, gave the problem to all the top models and got the same? Is it still likely an incorrect answer? I checked 6. I checked a bunch of times. Different accounts. I was testing it. I'm seeing if its possible with all that in others opinions I actually had to check over a hundred times each got the same numbers.

                          D This user is from outside of this forum
                          D This user is from outside of this forum
                          [email protected]
                          wrote last edited by
                          #31

                          They could get the right answer 9999 times out of 10000 and that one wrong answer is enough to make all the correct answers suspect.

                          1 Reply Last reply
                          0
                          • B [email protected]

                            You cannot trust LLMs. Period.

                            They are literally hallucination machines that just happen to be correct sometimes.

                            rivalarrival@lemmy.todayR This user is from outside of this forum
                            rivalarrival@lemmy.todayR This user is from outside of this forum
                            [email protected]
                            wrote last edited by
                            #32

                            So are most people.

                            1 Reply Last reply
                            0
                            • facedeer@fedia.ioF This user is from outside of this forum
                              facedeer@fedia.ioF This user is from outside of this forum
                              [email protected]
                              wrote last edited by
                              #33

                              If the LLMs are part of a modern framework I would expect that they should be calling out to Wolfram Alpha (or a similar specialized math-solver) via an API to get the answer for you, for that matter.

                              1 Reply Last reply
                              1
                              • F This user is from outside of this forum
                                F This user is from outside of this forum
                                [email protected]
                                wrote last edited by
                                #34

                                I'm a little confused after listening to a podcast with.... Damn I can't remember his name. He's English. They call him the godfather of AI. A pioneer.

                                Well, he believes that gpt 2-4 were major breakthroughs in artificial infection. He specifically said chat gpt is intelligent. That some type of reasoning is taking place. The end of humanity could come in a year to 50 years away. If the fella who imagined a Neural net that is mapped using the human brain. And this man says it is doing much more. Who should I listen too?. He didn't say hidden AI. HE SAID CHAT GPT. HONESTLY ON OFFENSE. I JUST DON'T UNDERSTAND THIS EPIC SCENARIO ON ONE SIDE AND TOTALLY NOTHING ON THE OTHER

                                G rhaedas@fedia.ioR 2 Replies Last reply
                                0
                                • P [email protected]

                                  What if there is a popular joke that relies on bad math that happens to be your question. Then the alignment is understandable and no indication of accuracy. Why use a tool with known issues, and overhead like querying six, instead of using a decent tool like Wolfram alpha?

                                  F This user is from outside of this forum
                                  F This user is from outside of this forum
                                  [email protected]
                                  wrote last edited by
                                  #35

                                  I did dozens of times. Same calculations.

                                  P 1 Reply Last reply
                                  0
                                  • F [email protected]

                                    I'm a little confused after listening to a podcast with.... Damn I can't remember his name. He's English. They call him the godfather of AI. A pioneer.

                                    Well, he believes that gpt 2-4 were major breakthroughs in artificial infection. He specifically said chat gpt is intelligent. That some type of reasoning is taking place. The end of humanity could come in a year to 50 years away. If the fella who imagined a Neural net that is mapped using the human brain. And this man says it is doing much more. Who should I listen too?. He didn't say hidden AI. HE SAID CHAT GPT. HONESTLY ON OFFENSE. I JUST DON'T UNDERSTAND THIS EPIC SCENARIO ON ONE SIDE AND TOTALLY NOTHING ON THE OTHER

                                    G This user is from outside of this forum
                                    G This user is from outside of this forum
                                    [email protected]
                                    wrote last edited by
                                    #36

                                    Anyone with a stake in the development of AI is lying to you about how good models are and how soon they will be able to do X.

                                    They have to be lying because the truth is that LLMs are terrible. They can't reason at all. When they perform well on benchmarks its because every benchmark contains questions that are in the LLMs training data. If you burn trillions of dollars and have nothing to show, you lie so people keep giving you money.

                                    https://arxiv.org/html/2502.14318

                                    However, the extent of this progress is frequently exaggerated based on appeals to rapid increases in performance on various benchmarks. I have argued that these benchmarks are of limited value for measuring LLM progress because of problems of models being over-fit to the benchmarks, lack real-world relevance of test items, and inadequate validation for whether the benchmarks predict general cognitive performance. Conversely, evidence from adversarial tasks and interpretability research indicates that LLMs consistently fail to learn the underlying structure of the tasks they are trained on, instead relying on complex statistical associations and heuristics which enable good performance on test benchmarks but generalise poorly to many real-world tasks.

                                    1 Reply Last reply
                                    2
                                    • F [email protected]

                                      It checked out. But, all six getting the same is likely incorrect?.

                                      E This user is from outside of this forum
                                      E This user is from outside of this forum
                                      [email protected]
                                      wrote last edited by
                                      #37

                                      If all 6 got the same answer multiple times, then that means that your query very strongly correlated with that reply in the training data used by all of them. Does that mean it's therefore correct? Well, no. It could mean that there were a bunch of incorrect examples of your query they used to come up with that answer. It could mean that the examples it's working from seem to follow a pattern that your problem fits into, but the correct answer doesn't actually fit that seemingly obvious pattern. And yes, there's a decent chance it could actually be correct. The problem is that the only way to eliminate those other still also likely possibilities is to actually do the problem, at which point asking the LLM accomplished nothing.

                                      F 1 Reply Last reply
                                      0
                                      • F [email protected]

                                        I'm a little confused after listening to a podcast with.... Damn I can't remember his name. He's English. They call him the godfather of AI. A pioneer.

                                        Well, he believes that gpt 2-4 were major breakthroughs in artificial infection. He specifically said chat gpt is intelligent. That some type of reasoning is taking place. The end of humanity could come in a year to 50 years away. If the fella who imagined a Neural net that is mapped using the human brain. And this man says it is doing much more. Who should I listen too?. He didn't say hidden AI. HE SAID CHAT GPT. HONESTLY ON OFFENSE. I JUST DON'T UNDERSTAND THIS EPIC SCENARIO ON ONE SIDE AND TOTALLY NOTHING ON THE OTHER

                                        rhaedas@fedia.ioR This user is from outside of this forum
                                        rhaedas@fedia.ioR This user is from outside of this forum
                                        [email protected]
                                        wrote last edited by
                                        #38

                                        One step might be to try and understand the basic principles behind what makes a LLM function. The Youtube channel 3blue1brown has at least one good video on transformers and how they work, and perhaps that will help you understand that "reasoning" is a very broad term that doesn't necessarily mean thinking. What is going on inside a LLM is fascinating and amazing in what does manage to come out that's useful, but like any tool it can't be used for everything well, if at all.

                                        1 Reply Last reply
                                        1
                                        • F This user is from outside of this forum
                                          F This user is from outside of this forum
                                          [email protected]
                                          wrote last edited by
                                          #39

                                          I'll ask AI what's really going on lolool.

                                          rhaedas@fedia.ioR 1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups