Can we trust LLM CALCULATIONS?.
-
Ok, you have a moderately complex math problem you needed to solve. You gave the problem to 6 LLMS all paid versions. All 6 get the same numbers. Would you trust the answer?
-
Ok, you have a moderately complex math problem you needed to solve. You gave the problem to 6 LLMS all paid versions. All 6 get the same numbers. Would you trust the answer?
no, once i tried to do binary calc with chat gpt and he keot giving me wrong answers. good thing i had sone unit tests around that part so realised quickly its lying
-
Ok, you have a moderately complex math problem you needed to solve. You gave the problem to 6 LLMS all paid versions. All 6 get the same numbers. Would you trust the answer?
No, thank you for coming to my TED Talk.
-
Ok, you have a moderately complex math problem you needed to solve. You gave the problem to 6 LLMS all paid versions. All 6 get the same numbers. Would you trust the answer?
wrote last edited by [email protected]short answer: no.
Long Answer: They are still (mostly) statisics based and can't do real math. You can use the answers from LLMs as starting point, but you have to rigerously verify the answers they give.
-
Ok, you have a moderately complex math problem you needed to solve. You gave the problem to 6 LLMS all paid versions. All 6 get the same numbers. Would you trust the answer?
Why would I bother?
Calculators exist, logic exists, so no... LLMs are a laughably bad fit for directly doing math, they are bullshit engines they cannot "store" a value without fundamentally exposing it to hallucinating tendencies which is the worst property a calculator could possibly have.
-
no, once i tried to do binary calc with chat gpt and he keot giving me wrong answers. good thing i had sone unit tests around that part so realised quickly its lying
But, if you ran, gave the problem to all the top models and got the same? Is it still likely an incorrect answer? I checked 6. I checked a bunch of times. Different accounts. I was testing it. I'm seeing if its possible with all that in others opinions I actually had to check over a hundred times each got the same numbers.
-
Ok, you have a moderately complex math problem you needed to solve. You gave the problem to 6 LLMS all paid versions. All 6 get the same numbers. Would you trust the answer?
Vibe math. No thank you
-
No, thank you for coming to my TED Talk.
Really all six models. ? Likely incorrect?
-
Ok, you have a moderately complex math problem you needed to solve. You gave the problem to 6 LLMS all paid versions. All 6 get the same numbers. Would you trust the answer?
Using a calculator or wolfram alpha or similar tools i don't trust the answer unless it passes a few sanity checks. Frequently I am the source of error and no LLM can compensate for that.
-
Why would I bother?
Calculators exist, logic exists, so no... LLMs are a laughably bad fit for directly doing math, they are bullshit engines they cannot "store" a value without fundamentally exposing it to hallucinating tendencies which is the worst property a calculator could possibly have.
It was about all six models getting the same answer from different accounts. I was testing it. Over a hundred each same numbers
-
Using a calculator or wolfram alpha or similar tools i don't trust the answer unless it passes a few sanity checks. Frequently I am the source of error and no LLM can compensate for that.
It checked out. But, all six getting the same is likely incorrect?.
-
It was about all six models getting the same answer from different accounts. I was testing it. Over a hundred each same numbers
wrote last edited by [email protected]Right so because LLMs are attrocious at actually precisely carrying out logic operations the solution was likely to just throw a normal calculator inside the AI, make the AI use the calculator and then turn around and handwave that the entire thing is AI.
So... you could just skip the bullshit and use a calculator, the AI just repackages the same answer with more boilerplate bullshit.
Wolfram Alpha is the non-bullshit version of this.
-
It checked out. But, all six getting the same is likely incorrect?.
Don't know. I've never asked any of them a maths question.
How costly is it to be wrong? You seem to care enough to ask people on the Internet so it suggests that it's fairly costly. I'd not trust them.
-
Really all six models. ? Likely incorrect?
That wasn’t the question. The question was whether you should trust the number and the answer is no. It could be correct or it could be incorrect. There’s not enough data to determine it.
LLMs work as predictive models. If you ask 10 people to estimate the height of a tree, and 8/10 estimate that it’s 10 ft tall, 2/10 estimate that it’s 8 ft tall, the most likely LLM answer is that it’s 10 ft tall. It doesn’t matter that if you actually go and measure the tree that it’s actually 15 ft tall. The LLM will likely report 8
-
But, if you ran, gave the problem to all the top models and got the same? Is it still likely an incorrect answer? I checked 6. I checked a bunch of times. Different accounts. I was testing it. I'm seeing if its possible with all that in others opinions I actually had to check over a hundred times each got the same numbers.
my use case was, i expect easier and simpler. so i was able to write automated tests to validate logic of incrementing specific parts of a binary number and found that expected test values llm produced were wrong.
so if its possible to use some kind of automation to verify llm results for your problem, you will be confident in your answer. but generally llms tend to make up shit and sound confident about it
-
Ok, you have a moderately complex math problem you needed to solve. You gave the problem to 6 LLMS all paid versions. All 6 get the same numbers. Would you trust the answer?
No. Dear God no. Llms are not computers. They are just prediction machines. They predict that the next value is probably this value. There is no actual math there.
-
Ok, you have a moderately complex math problem you needed to solve. You gave the problem to 6 LLMS all paid versions. All 6 get the same numbers. Would you trust the answer?
wrote last edited by [email protected]I wouldn't bother. If I really had to ask a bot, Wolfram Alpha is there as long as I can ask it without an AI meddling with my question.
E: To clarify, just because one AI or six will get the same answer that I can independently verify as correct for a simpler question, does not mean I can trust it for any arbitrary math question even if however many AIs arrive at the same answer. There's often the possibility the AI will stumble upon a logical flaw, exemplified by the "number of rs in strawberry" example.
-
Ok, you have a moderately complex math problem you needed to solve. You gave the problem to 6 LLMS all paid versions. All 6 get the same numbers. Would you trust the answer?
wrote last edited by [email protected]Would you trust six mathematicians who claimed to have solved a problem by intuition, but couldn’t prove it?
That’s not how mathematics works: if you have to “trust” the answer, it isn’t even math.
-
It was about all six models getting the same answer from different accounts. I was testing it. Over a hundred each same numbers
Irrelevant.
LLMs are incapable of reasoning. At the core level, it is a physical impossibility.
-
short answer: no.
Long Answer: They are still (mostly) statisics based and can't do real math. You can use the answers from LLMs as starting point, but you have to rigerously verify the answers they give.
The whole "two r's in strawberry" thing is enough of an argument for me. If things like that happen at such a low level, its completely impossible that it wont make mistakes with problems that are exponentially more complicated than that.