Why I am not impressed by A.I.

[email protected]

So for something you can't objectively evaluate?
Looking at Apple's garbage generator, LLMs aren't even good at summarising.

[email protected]

Yeah and you know I always hated this screwdrivers make really bad hammers.

[email protected]

That was this reality. Very briefly. Remember AI Dungeon and the other clones that were popular prior to the mass ml marketing campaigns of the last 2 years?

[email protected]

I asked Gemini if the quest has an SD slot. It doesn't, but Gemini said it did. Checking the source it was pulling info from the vive user manual

[email protected]

You rang?

[email protected]

This was an interesting read, thanks for sharing.

[email protected]

Just don't expect them to always tell the truth, or to actually be human-like

I think the point of the post is to call out exactly that: people preaching AI as replacing humans

[email protected]

Thats because it wasnt originally called AI. It was called an LLM. Techbros trying to sell it and articles wanting to fan the flames started called it AI and eventually it became common dialect. No one in the field seriously calls it AI, they generally save that terms to refer to general AI or at least narrow ai. Of which an llm is neither.

[email protected]

Fair enough - sounds like they might not be ready for prime time though.

Oh well, at least while the bugs get ironed-out we're not using them for anything important

[email protected]

And apparently, they apparently still can't get an accurate result with such a basic query.

And yet...
https://futurism.com/openai-signs-deal-us-government-nuclear-weapon-security

[email protected]

Here's my guess, aside from highlighted token issues:

We all know LLMs train on human-generated data. And when we ask something like "how many R's" or "how many L's" is in a given word, we don't mean to count them all - we normally mean something like "how many consecutive letters there are, so I could spell it right".

Yes, the word "strawberry" has 3 R's. But what most people are interested in is whether it is "strawberry" or "strawbery", and their "how many R's" refers to this exactly, not the entire word.

[email protected]

But to be fair, as people we would not ask "how many Rs does strawberry have", but "with how many Rs do you spell strawberry" or "do you spell strawberry with 1 R or 2 Rs"

[email protected]

That happens when do you not understand what is a llm, or what it's usecases are.

This is like not being impressed by a calculator because it cannot give a word synonym.

[email protected]

They are not random per se. They are just statistical with just some degree of randomization.

[email protected]

Exactly. The naming of the technology would make you assume it's intelligent. It's not.

[email protected]

What situations are you thinking of that requires reasoning?

I've used LLMs to create software i needed but couldn't find online.

[email protected]

"My hammer is not well suited to cut vegetables"

There is so much to say about AI, can we move on from "it can't count letters and do math" ?

[email protected]

Sure, maybe it’s not capable of producing the correct answer, which is fine. But it should say “As an LLM, I cannot answer questions like this” instead of just making up an answer.

[email protected]

Sure, but I definitely wouldn’t confidently answer “two”.

[email protected]

I have thought a lot on it. The LLM per se would not know if the question is answerable or not, as it doesn't know if their output is good of bad.

So there's various approach to this issue:

The classic approach, and the one used for censoring: keywords. When the llm gets a certain key word or it can get certain keyword by digesting a text input then give back a hard coded answer. Problem is that while censoring issues are limited. Hard to answer questions are unlimited, hard to hard code all.
Self check answers. For everything question the llm could process it 10 times with different seeds. Then analyze the results and see if they are equivalent. If they are not then just answer that it's unsure about the answer. Problem: multiplication of resource usage. For some questions like the one in the post, it's possible than the multiple randomized answers give equivalent results, so it would still have a decent failure rate.

agnos.is Forums

Why I am not impressed by A.I.