I mean, LLM are just next token in sequence predictors.

[email protected]

I mean, LLM are just next token in sequence predictors. Thats all they do. They take an input sequence, and based on that sequence, they predict the next thing to come. Its not magic.

Its just that embedded in language (apparently), there is information about meaning, at least as humans interpret it. But these methods can be applied to any kind of data, most of which doesn't come prepackaged with some human interpretation of meaning.

Just to toot my own bugle, here is some output of a transformer model I'm building/ working on this very second (as in, its on my second monitor):

Here I'm using time series climate, terrain, and satellite data to predict the amount of carbon stored in soils over time. Its the same idea as an LLM, except that instead of trying to predict the next token embedding in sequence, I'm asking the model to predict the next "climate/ spectral/ terrain/ soil" state, in sequence.

Typical models in this domain maybe can predict at about a 0.35-4 R^2^. Here you can see I'm getting a 0.54 R^2^.

All ML as we currently know it is basically giant piles of linear algebra and an optimization function.

Edit:

Here is a bonus panel that I just made:

(pdsi: Palmer Drought Severity Index, basically wet season dry season.)

[email protected]

All fine and dandy. Do you have an answer to OPs question?

[email protected]

I think others are mostly addressing that issue, which is why I went a different direction.

Its not really an answerable question, because we don't even have great definitions for human intelligence, let alone things that are only defined in abstract comparison to that concept.

Transformers, specifically the attention block, is just one step in a long and ongoing traditions of advancement in ML. Check out the paper "Attention is all you need". It was a pretty big deal. So I expect AI development to continue as it has. Things had already improved substantially even without LLM's but boy howdy, transformers were a big leap. There is much more advancement and focus than before. Maybe that will speed things up. But regardless, we should expect better models and architectures in the future.

Another way to think of this is scale. The energy density of these systems (I think) will become a limiting factor before anything else. This is not to say all of the components of these systems are of the same quality. A node in an transformer is of higher quality than that of a UNET. A worm synapse is of lower quality than a human synapse.

But, we can still compare the number of connections, not knowing if they are equal quality.

So a industry tier LLM has maybe 0.175 trillion connections. A human brain has about 100x that number of connections. If we believed the number of connections to be of equal quality, then LLM's need to be able to be 100x larger to compete with humans (but we know that they are already better than most humans in many tests). Keep in mind, a mature tree could have well over 16 000–56 000 trillion connections via its plasmodesma.

A human brain takes ~20 watts resting power. An LLM takes about 80 watts per inquiry reference. So a LLM takes quite a bit more energy per connection to run. We're running 100x the connections on 1/4th the power. We would need to see an 800% improvement in LLM's to be energy equivalent to humans (again, under the assumption of same "quality" of connection).

So we might see a physical limit to how intelligent an LLM can run so long as its running on silicon doped chip architecture. Proximity matters, alot, for the machines we use to run LLM's. We can't necessarily just add more processors and get smarter machines. They need to be smaller, and more energy efficient.

This is an approximation of moores law, and I've mapped on the physical limits of silicon:

So we really "cant" in big wavy parens, get close to what we would need to get the same computational density we see in humans. Then you have plants, where literally their energy is use is negative, and they have orders of magnitude more connections per kg than either silicon or human brain tissue are capable of.

So now we get to the question "Are all connections created equal?", which I think we can pretty easily answer: No. The examples I gave and many, many more.

We will see architectural improvements to current ML approaches.

[email protected]

This is all good info, thanks.

I just have one minor nitpick:

An LLM takes about 80 watts per inquiry reference. So a LLM takes quite a bit more energy per connection to run. We’re running 100x the connections on 1/4th the power.

The math is wrong here. At "rest", the brain is still doing work. What you call "an inquiry reference", is just one LLM operation. I'm sure the human brain is doing much more than that. A human being is thinking "what should I have for dinner?" "What should have I said to Gwen instead?" "That ceiling needs some paint" "My back hurts a bit". That clang you heard earlier in the distance? You didn't pay attention to it at the time, but your brain surely did. Let's not begin with the body functions that the brain, yes, the brain, must manage.

So an LLM is much, much, much more resource intensive than that claim.

[email protected]

I think the at rest is normal human temperature,normal heartbeat,normal breathing, etc.

we can use a kill-a-watt to measure the machine side.

agnos.is Forums

I mean, LLM are just next token in sequence predictors.