Interesting, illuminating (but contested) metaphor for thinking about LLMs from one of my favourite authors, Ted Chiang:
"Think of ChatGPT as a blurry jpeg of all the text on the Web. It retains much of the information... but, if you’re looking for an exact sequence of bits, you won’t find it; all you will ever get is an approximation... nonsensical answers to factual questions... are compression artifacts... plausible enough that identifying them requires comparing them against the originals ...
a common technique used by lossy compression algorithms is interpolation ... when [ChatGPT] is prompted to describe... losing a sock in the dryer using the style of the Declaration of Independence: it is taking two points in “lexical space” and generating the text that would occupy the location between them."
But the best way of compressing knowledge is to understand it: "the more the program knows about supply and demand, the more words it can discard when compressing the pages about economics... [But can] we say that it actually understands economic theory?", given that its understanding stems from statistical analyses of the text which "reveal that phrases like “supply is low” often appear in close proximity to phrases like “prices rise.”"?
It certainly doesn't do maths well once the numbers get large - "there aren’t many Web pages that contain the text “245 + 821,” " So if it hasn’t mastered basic maths but can write college-level essays, do "statistical regularities in text actually correspond to genuine knowledge of the real world?"
Consider a chatbot which simply quotes relevant pages: "In human students, rote memorization isn’t an indicator of genuine learning, so ChatGPT’s inability to produce exact quotes from Web pages is precisely what makes us think that it has learned something... lossy compression looks smarter than lossless compression."
But what is it actually useful for?
So... not that useful: "So just how much use is a blurry jpeg, when you still have the original?
More Stuff I Like