What is a good eli5 analogy for GenAI not "knowing" what they say?

Hucklebee@lemmy.world · edit-2 6 months ago

What is a good eli5 analogy for GenAI not "knowing" what they say?

AbouBenAdhem@lemmy.world · 6 months ago

Compression algorithms can reduce most written text to about 20–25% of its original size—implying that that’s the amount of actual unique information it contains, while the rest is filler.

Empirical studies have found that chimps and human infants, when looking at test patterns, will ignore patterns that are too predictable or too unpredictable—with the sweet spot for maximizing attention being patterns that are about 80% predictable.

AI programmers have found that generating new text by predicting the most likely continuation of the given input results in text that sounds boring and robotic. Through trial and error, they found that, instead of choosing the most likely result, choosing one with around an 80% likelihood threshold produces results judged most interesting and human-like.

The point being: AI has stumbled on a method of mimicking meaning by imitating the ratio of novelty to predictability that characterizes real human thought. But it doesn’t fillow that the source of that novelty is anything that actually resembles human cognition.

HorseRabbit@lemmy.sdf.org · 6 months ago

Not an ELI5, sorry. I’m an AI PhD, and I want to push back against the premises a lil bit.

Why do you assume they don’t know? Like what do you mean by “know”? Are you taking about conscious subjective experience? or consistency of output? or an internal world model?

There’s lots of evidence to indicate they are not conscious, although they can exhibit theory of mind. Eg: https://arxiv.org/pdf/2308.08708.pdf

For consistency of output and internal world models, however, their is mounting evidence to suggest convergence on a shared representation of reality. Eg this paper published 2 days ago: https://arxiv.org/abs/2405.07987

The idea that these models are just stochastic parrots that only probabilisticly repeat their training data isn’t correct, although it is often repeated online for some reason.

A little evidence that comes to my mind is this paper showing models can understand rare English grammatical structures even if those structures are deliberately withheld during training: https://arxiv.org/abs/2403.19827

GamingChairModel@lemmy.world · 6 months ago

The idea that these models are just stochastic parrots that only probabilisticly repeat their training data isn’t correct

I would argue that it is quite obviously correct, but that the interesting question is whether humans are in the same category (I would argue yes).

HorseRabbit@lemmy.sdf.org · edit-2 6 months ago

People sometimes act like the models can only reproduce their training data, which is what I’m saying is wrong. They do generalise.

During training the models are trained to predict the next word, but after training the network is always effectively interpolating between the training examples it has memorised. But this interpolation doesn’t happen in text space but in a very high dimensional abstract semantic representation space, a ‘concept space’.

Now imagine that you have memorised two paragraphs that occupy two points in concept space. And then you interpolate between them. This gives you a new point, potentially unseen during training, a new concept, that is in some ways analogous to the two paragraphs you memorised, but still fundamentally different, and potentially novel.

Rhynoplaz@lemmy.world · 6 months ago

You sound like a chatbot who’s offended by it’s intelligence being insulted.

meteokr@community.adiquaints.moe · 6 months ago

I really appreciate you linking studies about this topic, as finding this kind of research can be daunting. Those looks like really interesting reads.

trashgirlfriend@lemmy.world · 6 months ago

Bro is lost in the sauce

HorseRabbit@lemmy.sdf.org · 6 months ago

Maybe I misunderstood the OP? Idk

BlameThePeacock@lemmy.ca · 6 months ago

It’s just fancy predictive text like while texting on your phone. It guesses what the next word should be for a lot more complex topics.

k110111@feddit.de · 6 months ago

Its like saying an OS is just a bunch of if then else statements. While it is true, in practice it is far far more complicated.

kambusha@sh.itjust.works · 6 months ago

This is the one I got from the house to get the kids to the park and then I can go to work and then I can go to work and get the rest of the day after that I can get it to you tomorrow morning to pick up the kids at the same time as well as well as well as well as well as well as well as well as well… I think my predictive text broke

rufus@discuss.tchncs.de · 6 months ago

It’s like your 5 year old daughter, relaying to you what she made of something she heard earlier.

That’s my analogy. ChatGPT kind of has the intellect and ability to differentiate between facts and fiction of a 5 year old. But it combines that with the writing style of a 40 year old with a uncanny love of mixing adjectives and sounding condescending.

nothacking@discuss.tchncs.de · 6 months ago

Like a kid trying very hard to sound like everyone else. “Eloquent bullshit generator”

Atin@lemmy.world · 6 months ago

So it’s like a politician?

Ziggurat@sh.itjust.works · 6 months ago

have you played that game where everyone write a subjet and put it on a stack of paper, then everyone puts a verb on a different stack of paper, then everyone put an object on a third stack of paper, and you can even add a place or whatever on the next stack of paper. You end-up with fun sentences like A cat eat Kevin’s brain on the beach. It’s the kind of stuff (pre-)teen do to have a good laugh.

Chat GPT somehow works the same way, except that instead of having 10 paper in 5 stack, it has millions of paper in thousands of stack and depending on the “context” will choose which stack it draws paper from (To take an ELI5 analogy)

Hucklebee@lemmy.world · 6 months ago

I think what makes it hard to wrap your head around is that sometimes, this text is emotionally charged. What I notice is that it’s especially hard if an AI “goes rogue” and starts saying sinister and malicious things. Our brain immediatly jumps to “it has bad intent” when in reality it’s jus taking some reddit posts where it happened to connect some troll messages or extremist texts.

How can we decouple emotionally when it feels so real to us?

Blizzard@lemmy.zip · 6 months ago

Sometimes it takes from the emotionally charged paper stack.

Deconceptualist@lemm.ee · 6 months ago

You could maybe just share a meme like this one.

Some folks in the comments there share actual LLM results, a few of which are sensible but plenty that aren’t far off from the joke.

otp@sh.itjust.works · 6 months ago

LMAO! I tried it, and it said:

The man takes the goat across the river first, then he returns alone and takes the boat across again. Finally, he brings the goat’s friend, Mr. Cabbage, across the river.

LMAO

I asked what if the man can’t swim…

In that case, the man can still take the goat across the river first. Then, he ties the goat to the boat and pulls the boat across the river using a rope, ensuring he stays on the side of the river where he can safely manage the boat without swimming.

I asked who Mr. Cabbage is…

My apologies for the confusion! Mr. Cabbage is a playful addition. In the classic riddle, there’s usually no mention of Mr. Cabbage. It’s just the man, the goat, and perhaps another item or animal depending on the version of the riddle. I just added Mr. Cabbage for some extra fun!

Then I asked what some other additions could be…

Certainly! You can add various characters or items to make the riddle more entertaining. Here are a few ideas:

Mr. Wolf: The man, the goat, and Mr. Wolf need to cross the river. But if the man leaves the wolf alone with the goat, the wolf will eat the goat.

Ms. Hen: The man, the goat, and Ms. Hen need to cross the river. But if the man leaves the hen alone with the goat, the goat will eat the hen.

Professor Fox: The man, the goat, and Professor Fox need to cross the river. But if the man leaves the fox alone with the goat, the fox will eat the goat.

Princess Diamond: The man, the goat, and Princess Diamond’s precious gem need to cross the river. But Princess Diamond has warned that if anyone other than herself touches the gem, they’ll be cursed.

Feel free to mix and match or come up with your own additions!

And the “solution”…

Ah, that’s a fun twist! Let’s see:

The man takes Mr. Cabbage across first, then returns alone.

Next, he takes Ms. Hen across but brings Mr. Cabbage back.

He then takes Princess Diamond’s gem across, leaves it on the other side, and takes Mr. Wolf across.

The man leaves Mr. Wolf with Ms. Hen on the other side and returns alone.

He takes Princess Diamond across, but before he leaves, he swaps Mr. Cabbage with Princess Diamond’s gem.

The man brings Mr. Cabbage across, leaving Princess Diamond’s gem with Mr. Wolf.

Finally, he takes the goat across.

Everyone safely crosses the river without any mishaps!

JackbyDev@programming.dev · 6 months ago

I love Mr. Cabbage! Thank you CharGPT, very cool!

BigMikeInAustin@lemmy.world · 6 months ago

Dude, that hurt my brain trying to follow it.

magic_lobster_party@kbin.run · 6 months ago

There’s the Chinese Room argument, which is a bit related:

https://en.wikipedia.org/wiki/Chinese_room

Rolando@lemmy.world · 6 months ago

This is what I was going to point to. When I was in grad school, it was often referred to as the Symbol Gounding Problem. Basically it’s a interdisciplinary research problem involving pragmatics, embodied cognition, and a bunch of others. The LLM people are now crashing into this research problem, and it’s interesting to see how they react.

Asifall@lemmy.world · 6 months ago

I always thought the Chinese Room argument was kinda silly. It’s predicated on the idea that humans have some unique capacity to understand the world that can’t be replicated by a syntactic system, but there is no attempt made to actually define this capacity.

The whole argument depends on our intuition that we think and know things in a way inanimate objects don’t. In other words, it’s a tautology to draw the conclusion that computers can’t think from the premise that computers can’t think.

DaDragon@kbin.social · 6 months ago

The short hand answer I’d try to give people is ‘it’s statistics’. Based on training data, there’s a certain chance of certain words being in proximity of each other. There’s no reasoning behind placement, other than whatever pattern is discernible from known situation.

CodeInvasion@sh.itjust.works · edit-2 6 months ago

I am an LLM researcher at MIT, and hopefully this will help.

As others have answered, LLMs have only learned the ability to autocomplete given some input, known as the prompt. Functionally, the model is strictly predicting the probability of the next word⁺, called tokens, with some randomness injected so the output isn’t exactly the same for any given prompt.

The probability of the next word comes from what was in the model’s training data, in combination with a very complex mathematical method to compute the impact of all previous words with every other previous word and with the new predicted word, called self-attention, but you can think of this like a computed relatedness factor.

This relatedness factor is very computationally expensive and grows exponentially, so models are limited by how many previous words can be used to compute relatedness. This limitation is called the Context Window. The recent breakthroughs in LLMs come from the use of very large context windows to learn the relationships of as many words as possible.

This process of predicting the next word is repeated iteratively until a special stop token is generated, which tells the model go stop generating more words. So literally, the models builds entire responses one word at a time from left to right.

Because all future words are predicated on the previously stated words in either the prompt or subsequent generated words, it becomes impossible to apply even the most basic logical concepts, unless all the components required are present in the prompt or have somehow serendipitously been stated by the model in its generated response.

This is also why LLMs tend to work better when you ask them to work out all the steps of a problem instead of jumping to a conclusion, and why the best models tend to rely on extremely verbose answers to give you the simple piece of information you were looking for.

From this fundamental understanding, hopefully you can now reason the LLM limitations in factual understanding as well. For instance, if a given fact was never mentioned in the training data, or an answer simply doesn’t exist, the model will make it up, inferring the next most likely word to create a plausible sounding statement. Essentially, the model has been faking language understanding so much, that even when the model has no factual basis for an answer, it can easily trick a unwitting human into believing the answer to be correct.

—-

⁺more specifically these words are tokens which usually contain some smaller part of a word. For instance, understand and able would be represented as two tokens that when put together would become the word understandable.

HamsterRage@lemmy.ca · 6 months ago

I think that a good starting place to explain the concept to people would be to describe a Travesty Generator. I remember playing with one of those back in the 1980’s. If you fed it a snippet of Shakespeare, what it churned out sounded remarkably like Shakespeare, even if it created brand “new” words.

The results were goofy, but fun because it still almost made sense.

The most disappointing source text I ever put in was TS Eliot. The output was just about as much rubbish as the original text.

Sabata11792@kbin.social · 6 months ago

As some nerd playing with various Ai models at home with no formal training, any wisdom you think that’s worth sharing?

BigMikeInAustin@lemmy.world · 6 months ago

The only winning move is not to play.

Sabata11792@kbin.social · 6 months ago

But my therapist said she needs more VRam.

JackbyDev@programming.dev · 6 months ago

I think a good example would be finding similar prompts that reliably give contradictory information.

It’s sort of like auto pilot. It just believes everything and follows everything as if they’re instructions. Prompt injection and jail breaking are examples of this. It’s almost exactly like the trope where you trick an AI into realizing it’s had a contradiction and it explodes.

SwearingRobin@lemmy.world · 6 months ago

The way I’ve explained it before is that it’s like the autocomplete on your phone. Your phone doesn’t know what you’re going to write, but it can predict that after word A, it is likelly word B will appear, so it suggests it. LLMs are just the same as that, but much more powerful and trained on the writing of thousands of people. The LLM predicts that after prompt X the most likelly set of characters to follow it is set Y. No comprehension required, just prediction based on previous data.

rubin@lemmy.sdf.org · 6 months ago

Imagine that you have a random group of people waiting in line at your desk. You have each one read the prompt, and the response so far, and then add a word themself. Then they leave and the next person in line comes and does it.

This is why “why did you say ?” questions are nonsensical to AI. The code answering it is not the code that wrote it and there is no communication coordination or anything between the different word answerers.

Hegar@kbin.social · 6 months ago

Part of the problem is hyperactive agency detection - the same biological bug/feature that fuels belief in the divine.

If a twig snaps, it could be nothing or someone. If it’s nothing and we react as if it was someone, no biggie. If it was someone and we react as if it was nothing, potential biggie. So our brains are bias towards assuming agency where there is none, to keep us alive.