
You can’t ask a compulsive liar for the truth
LLMs have no concept of lying, being truthful, or being correct or wrong, even if they exhibit behavior that seems that way and generate words that say as much. Explaining this isn’t trivial, but I think it should be self-evident when considering that there are situations in which asking someone/something about themself/itself doesn’t give you the full story. One could play pretend or exhibit misleading behavior
I remember reading something about LLMs not being able to learn “x is y” equivalence relations. Can’t find it now but limitations like this are what make differences clear between what humans do and what we’ve managed to teach the neural network (which will be used to iterate and improve the model further, of course)
In the Chinese box analogy, this would be like them knowing cats are considered cute but not whether considered-cute animals include cats (if I remember the limitation type correctly). If you happen to slip the right instructions/questions, something they’ve seen before or something they’re capable of extrapolating, then nothing seems off; but if someone can say in one paragraph that cats are cute but they know of no cute animal, you’d not think they’re understanding what they’re saying, and so don’t really understand the language even if they give you plausible words in all other cases
(For cats it’ll work because there’s a billion example sentences out there. LLM vendors are also trying to sidestep such problems by having it generate a bunch of tangential text (in which it might happen to regurgitate the tokens it needs to piece together the answer) before answering the prompt, but that’s still not being able to apply logic)