Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

agnos.is Forums

  1. Home
  2. Ask Lemmy
  3. Can we trust LLM CALCULATIONS?.

Can we trust LLM CALCULATIONS?.

Scheduled Pinned Locked Moved Ask Lemmy
asklemmy
69 Posts 48 Posters 3 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • F [email protected]

    It checked out. But, all six getting the same is likely incorrect?.

    Z This user is from outside of this forum
    Z This user is from outside of this forum
    [email protected]
    wrote last edited by
    #13

    Don't know. I've never asked any of them a maths question.

    How costly is it to be wrong? You seem to care enough to ask people on the Internet so it suggests that it's fairly costly. I'd not trust them.

    1 Reply Last reply
    1
    • F [email protected]

      Really all six models. ? Likely incorrect?

      H This user is from outside of this forum
      H This user is from outside of this forum
      [email protected]
      wrote last edited by
      #14

      That wasn’t the question. The question was whether you should trust the number and the answer is no. It could be correct or it could be incorrect. There’s not enough data to determine it.

      LLMs work as predictive models. If you ask 10 people to estimate the height of a tree, and 8/10 estimate that it’s 10 ft tall, 2/10 estimate that it’s 8 ft tall, the most likely LLM answer is that it’s 10 ft tall. It doesn’t matter that if you actually go and measure the tree that it’s actually 15 ft tall. The LLM will likely report 8

      1 Reply Last reply
      11
      • F [email protected]

        But, if you ran, gave the problem to all the top models and got the same? Is it still likely an incorrect answer? I checked 6. I checked a bunch of times. Different accounts. I was testing it. I'm seeing if its possible with all that in others opinions I actually had to check over a hundred times each got the same numbers.

        O This user is from outside of this forum
        O This user is from outside of this forum
        [email protected]
        wrote last edited by
        #15

        my use case was, i expect easier and simpler. so i was able to write automated tests to validate logic of incrementing specific parts of a binary number and found that expected test values llm produced were wrong.

        so if its possible to use some kind of automation to verify llm results for your problem, you will be confident in your answer. but generally llms tend to make up shit and sound confident about it

        1 Reply Last reply
        0
        • F [email protected]

          Ok, you have a moderately complex math problem you needed to solve. You gave the problem to 6 LLMS all paid versions. All 6 get the same numbers. Would you trust the answer?

          scrubbles@poptalk.scrubbles.techS This user is from outside of this forum
          scrubbles@poptalk.scrubbles.techS This user is from outside of this forum
          [email protected]
          wrote last edited by
          #16

          No. Dear God no. Llms are not computers. They are just prediction machines. They predict that the next value is probably this value. There is no actual math there.

          1 Reply Last reply
          3
          • F [email protected]

            Ok, you have a moderately complex math problem you needed to solve. You gave the problem to 6 LLMS all paid versions. All 6 get the same numbers. Would you trust the answer?

            R This user is from outside of this forum
            R This user is from outside of this forum
            [email protected]
            wrote last edited by [email protected]
            #17

            I wouldn't bother. If I really had to ask a bot, Wolfram Alpha is there as long as I can ask it without an AI meddling with my question.

            E: To clarify, just because one AI or six will get the same answer that I can independently verify as correct for a simpler question, does not mean I can trust it for any arbitrary math question even if however many AIs arrive at the same answer. There's often the possibility the AI will stumble upon a logical flaw, exemplified by the "number of rs in strawberry" example.

            1 Reply Last reply
            9
            • F [email protected]

              Ok, you have a moderately complex math problem you needed to solve. You gave the problem to 6 LLMS all paid versions. All 6 get the same numbers. Would you trust the answer?

              A This user is from outside of this forum
              A This user is from outside of this forum
              [email protected]
              wrote last edited by [email protected]
              #18

              Would you trust six mathematicians who claimed to have solved a problem by intuition, but couldn’t prove it?

              That’s not how mathematics works: if you have to “trust” the answer, it isn’t even math.

              1 Reply Last reply
              13
              • F [email protected]

                It was about all six models getting the same answer from different accounts. I was testing it. Over a hundred each same numbers

                J This user is from outside of this forum
                J This user is from outside of this forum
                [email protected]
                wrote last edited by
                #19

                Irrelevant.

                LLMs are incapable of reasoning. At the core level, it is a physical impossibility.

                1 Reply Last reply
                4
                • S [email protected]

                  short answer: no.

                  Long Answer: They are still (mostly) statisics based and can't do real math. You can use the answers from LLMs as starting point, but you have to rigerously verify the answers they give.

                  U This user is from outside of this forum
                  U This user is from outside of this forum
                  [email protected]
                  wrote last edited by
                  #20

                  The whole "two r's in strawberry" thing is enough of an argument for me. If things like that happen at such a low level, its completely impossible that it wont make mistakes with problems that are exponentially more complicated than that.

                  O 1 Reply Last reply
                  19
                  • F [email protected]

                    Ok, you have a moderately complex math problem you needed to solve. You gave the problem to 6 LLMS all paid versions. All 6 get the same numbers. Would you trust the answer?

                    G This user is from outside of this forum
                    G This user is from outside of this forum
                    [email protected]
                    wrote last edited by
                    #21

                    Probably, depending on the context. It is possible that all 6 models were trained on the same misleading data, but not very likely in general.

                    Number crunching isn't an obvious LLM use case, though. Depending on the task, having it create code to crunch the numbers, or a step-by-step tutorial on how to derive the formula, would be my preference.

                    1 Reply Last reply
                    0
                    • F [email protected]

                      Ok, you have a moderately complex math problem you needed to solve. You gave the problem to 6 LLMS all paid versions. All 6 get the same numbers. Would you trust the answer?

                      S This user is from outside of this forum
                      S This user is from outside of this forum
                      [email protected]
                      wrote last edited by
                      #22

                      If you really want to see how good they are, have them do the full calculation for 30! (30 factorial) And see how close it gets to real numbers.

                      1 Reply Last reply
                      0
                      • F [email protected]

                        Ok, you have a moderately complex math problem you needed to solve. You gave the problem to 6 LLMS all paid versions. All 6 get the same numbers. Would you trust the answer?

                        aatube@kbin.melroy.orgA This user is from outside of this forum
                        aatube@kbin.melroy.orgA This user is from outside of this forum
                        [email protected]
                        wrote last edited by [email protected]
                        #23

                        this is a really weird premise. doing the same thing on 6 models is just not worth it especially when wolfram alpha exists and is far more trustable and speedy

                        1 Reply Last reply
                        1
                        • F [email protected]

                          Ok, you have a moderately complex math problem you needed to solve. You gave the problem to 6 LLMS all paid versions. All 6 get the same numbers. Would you trust the answer?

                          M This user is from outside of this forum
                          M This user is from outside of this forum
                          [email protected]
                          wrote last edited by [email protected]
                          #24

                          LLMs don't and can't do math. They don't calculate anything, that's just not how they work. Instead, they do this:

                          2 + 2 = ? What comes after that? Oh, I remember! It's '4'!

                          It could be right, it could be wrong. If there's enough pattern in the training data, it could remember the correct answer. Otherwise it'll just place a plausible looking value there (behavior known as AI hallucination). So, you can not "trust" it.

                          M greg@lemmy.caG N zos_kia@lemmynsfw.comZ 4 Replies Last reply
                          6
                          • O [email protected]

                            no, once i tried to do binary calc with chat gpt and he keot giving me wrong answers. good thing i had sone unit tests around that part so realised quickly its lying

                            pika@sh.itjust.worksP This user is from outside of this forum
                            pika@sh.itjust.worksP This user is from outside of this forum
                            [email protected]
                            wrote last edited by
                            #25

                            Just yesterday I was fiddling around with a logic test in python. I wanted to see how well deepseek could analyze the intro line to a for loop, it properly identified what it did in the description, but when it moved onto giving examples it contradicted itself and took 3 or 4 replies before it realized that it contradicted itself.

                            1 Reply Last reply
                            0
                            • F [email protected]

                              It checked out. But, all six getting the same is likely incorrect?.

                              P This user is from outside of this forum
                              P This user is from outside of this forum
                              [email protected]
                              wrote last edited by [email protected]
                              #26

                              Yes. All six are likely to be incorrect.

                              Similarly, you could ask a subtle quantum mechanics question to six psychologists, and all six may well give you the same answer. You still should not trust that answer.

                              The way that LLMs correlate and gather answers is particularly unsuited to mathematics.

                              Edit: I. Contrast, the average Psychologist is much more prepared to answer a quantum mechanics question, than an average LLM is to answer a math or counting question.

                              1 Reply Last reply
                              2
                              • O [email protected]

                                no, once i tried to do binary calc with chat gpt and he keot giving me wrong answers. good thing i had sone unit tests around that part so realised quickly its lying

                                dan1101@lemmy.worldD This user is from outside of this forum
                                dan1101@lemmy.worldD This user is from outside of this forum
                                [email protected]
                                wrote last edited by
                                #27

                                Yes more people need to realize it's just a search engine with natural language input and output. LLM output should at least include citations.

                                1 Reply Last reply
                                0
                                • F [email protected]

                                  But, if you ran, gave the problem to all the top models and got the same? Is it still likely an incorrect answer? I checked 6. I checked a bunch of times. Different accounts. I was testing it. I'm seeing if its possible with all that in others opinions I actually had to check over a hundred times each got the same numbers.

                                  P This user is from outside of this forum
                                  P This user is from outside of this forum
                                  [email protected]
                                  wrote last edited by
                                  #28

                                  What if there is a popular joke that relies on bad math that happens to be your question. Then the alignment is understandable and no indication of accuracy. Why use a tool with known issues, and overhead like querying six, instead of using a decent tool like Wolfram alpha?

                                  F 1 Reply Last reply
                                  0
                                  • F [email protected]

                                    Ok, you have a moderately complex math problem you needed to solve. You gave the problem to 6 LLMS all paid versions. All 6 get the same numbers. Would you trust the answer?

                                    B This user is from outside of this forum
                                    B This user is from outside of this forum
                                    [email protected]
                                    wrote last edited by [email protected]
                                    #29

                                    You cannot trust LLMs. Period.

                                    They are literally hallucination machines that just happen to be correct sometimes.

                                    rivalarrival@lemmy.todayR 1 Reply Last reply
                                    1
                                    • F [email protected]

                                      Ok, you have a moderately complex math problem you needed to solve. You gave the problem to 6 LLMS all paid versions. All 6 get the same numbers. Would you trust the answer?

                                      rhaedas@fedia.ioR This user is from outside of this forum
                                      rhaedas@fedia.ioR This user is from outside of this forum
                                      [email protected]
                                      wrote last edited by
                                      #30

                                      How trustable the answer is depends on knowing where the answers come from, which is unknowable. If the probability of the answers being generated from the original problem are high because it occurred in many different places in the training data, then maybe it's correct. Or maybe everyone who came up with the answer is wrong in the same way and that's why there is so much correlation. Or perhaps the probability match is simply because lots of math problems tend towards similar answers.

                                      The core issue is that the LLM is not thinking or reasoning about the problem itself, so trusting it with anything is more assuming the likelihood of it being right more than wrong is high. In some areas this is safe to do, in others it's a terrible assumption to make.

                                      1 Reply Last reply
                                      1
                                      • F [email protected]

                                        But, if you ran, gave the problem to all the top models and got the same? Is it still likely an incorrect answer? I checked 6. I checked a bunch of times. Different accounts. I was testing it. I'm seeing if its possible with all that in others opinions I actually had to check over a hundred times each got the same numbers.

                                        D This user is from outside of this forum
                                        D This user is from outside of this forum
                                        [email protected]
                                        wrote last edited by
                                        #31

                                        They could get the right answer 9999 times out of 10000 and that one wrong answer is enough to make all the correct answers suspect.

                                        1 Reply Last reply
                                        0
                                        • B [email protected]

                                          You cannot trust LLMs. Period.

                                          They are literally hallucination machines that just happen to be correct sometimes.

                                          rivalarrival@lemmy.todayR This user is from outside of this forum
                                          rivalarrival@lemmy.todayR This user is from outside of this forum
                                          [email protected]
                                          wrote last edited by
                                          #32

                                          So are most people.

                                          1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups