Jon Stokes thinks "people are talking about this chatbot in unhelpful ways... anthropomorphizing ... [and] not working with a practical, productive understanding of what the bot’s main parts are and how they fit together."
So he wrote this explainer.
"At the heart of ChatGPT is a large language model (LLM) that belongs to the family of generative machine learning models... a function that can take a structured collection of symbols as input and produce a related structured collection of symbols as output... [like] Letters in a word, Words in a sentence, Pixels in an image, Frames in a video... The hard and expensive part ... is hidden deep inside the word related... the more abstract and subtle the relationship, the more technology we’ll need".
He provides 4 examples of relationships between 2 concepts:
as the number of possible relationships increases, the qualities of the relationships themselves increase in abstraction, complexity, and subtlety... [In this] bewildering, densely connected network of concepts", some relationships are more likely than others, introducing an element of probability.
For example, "we’re talking about a cat, it’s more likely that the mature/immature dichotomy is related to a cluster of concepts around physical development and less likely that it’s related to a cluster of concepts around emotional or intellectual development", but it still might me about the latter.
In summary, "When the relationships among collections of symbols are complex and stochastic, then throwing more storage and computing power at the problem of relating one collection to another enables you to relate those collections in ever richer and more complex ways." He then introduces the concept of probability distribution using the atomic orbitals of hydrogen atoms(!).
Why? For LLMs, "each possible blob of text the model could generate ... is a single point in a probability distribution". So when you write a text and hit "Submit", you're collapsing the wave function, resulting "in an observation of a single collection of symbols", so sometimes you arrive at "a point in the probability distribution ... like, {The cat is alive}, and at other times you’ll end up at a point that corresponds to {The cat is dead}... depending on the shape of the probability distributions ... and on the dice that the ... computer’s random number generator is rolling."
Instead of thinking that the model knows the cat's status, understand that "In the space of all the possible collections of symbols the model could produce... there are regions in the model’s probability distributions that contain collections of symbols we humans interpret to mean that the cat is alive. And ... adjacent regions ... containing collections of symbols we interpret to mean the cat is dead... ChatGPT’s latent space — i.e., the space of possible outputs ... has been deliberately sculpted into a particular shape by an expensive training process".
Different input collections (prompts) will result in different output collections (responses), which is why we can ask the same question in different ways and get different answers: we’re landing on different points in a probability distribution.
So can we "eliminate or at least shrink the probability distribution" of untrue statements? And should we?
We have three tools:
Which raises some interesting questions.
Only human beings can interpret the meaning of a text. LLMs are not an author and have no intent, so there is no authorial intent, and so perhaps represent "humanity’s first practical use for reader response theory... ChatGPT doesn’t think about you" its chat-based UX is just "a kind of skeuomorphic UI affordance", and is not "truly talking to you... just shouting language-like symbol collections into the void".
To understand this better, understand token windows: "pre-ChatGPT language models ... take in tokens (symbols) and spit out tokens. The window is the number of tokens a model can ingest... the probability space does not change shape in response to the tokens... weights remain static ... It doesn’t remember", which is why "GPT-3 purposefully inject randomness".
ChatGPT, however, does remember, "appending the output of each new exchange to the existing output so that the content of the token window grows" While it can access the entire chat history, that's all it knows about you. The token window is therefore a "shared, mutable state" that the model and I develop together, giving the model more input to find the most relevant word sequence for me."
BingChat seems to include a websearch too.
The upcoming "32K-token window [is]... enough tokens to really load up the model with fresh facts, like customer service histories, book chapters or scripts, action sequences, and many other things."
More Stuff I Like
MyHub.ai saves very few cookies onto your device: we need some to monitor site traffic using Google Analytics, while another protects you from a cross-site request forgeries. Nevertheless, you can disable the usage of cookies by changing the settings of your browser. By browsing our website without changing the browser settings, you grant us permission to store that information on your device. More details in our Privacy Policy.