Meaning as Geometry
The distributional hypothesis and why context defines meaning
Here is a core idea: we can represent meaning as position in space. Not physical space—a mathematical space where each dimension captures some aspect of meaning. In this space, similar concepts are nearby. Related ideas form clusters. And the relationships between concepts become geometric operations.
This is not metaphor. It is the foundation of how modern AI understands language.
The Distributional Hypothesis
In 1957, linguist John Firth wrote: "You shall know a word by the company it keeps." This observation, formalized as the distributional hypothesis, states that words appearing in similar contexts have similar meanings.
Consider the words "dog" and "cat." They appear in similar contexts: "The ___ is sleeping on the couch." "I need to feed the ___." "The ___ chased the ball." "She took the ___ to the vet." Because they occur in similar linguistic neighborhoods, they must share some meaning. Both are pets, animals, household companions. The contexts reveal this without anyone explicitly teaching it.
Interactive: Contexts reveal meaning
"dog" and "cat" share many context words, suggesting similar meanings.
This is empirically testable. Analyze billions of words of text. Count which words appear near which other words. Words with similar co-occurrence patterns cluster together. The hypothesis holds.
From Counts to Vectors
How do we turn this insight into computation? The direct approach: build a giant table. Rows are words. Columns are context words. Each cell counts how often the row word appears near the column word.
If we have 50,000 words in our vocabulary, each word gets a 50,000-dimensional vector—one dimension for each possible context word. The value in each dimension is the count (or a transformation of it) for that context.
Words with similar count patterns have similar vectors. "Dog" and "cat" have high counts for "pet," "feed," "vet," and low counts for "economy," "algorithm," "parliament."
Building vectors from context
| Context | pet | animal | fur | vet | engine | wheel | road | drive |
|---|---|---|---|---|---|---|---|---|
| dog | ||||||||
| cat |
Darker cells indicate higher co-occurrence. Similar patterns mean similar vectors.
This raw approach works but is impractical. 50,000 dimensions is too many, most values are zero, and raw counts overweight common words. Modern embedding methods learn a compressed representation—typically 256 to 1024 dimensions—that preserves the essential relationships while discarding noise.
The Geometry of Meaning
Once words become vectors, meaning becomes geometry. Consider what it means for two vectors to be similar:
Cosine similarity measures the angle between vectors. Vectors pointing in the same direction are similar regardless of their length. This captures the intuition that "dog" and "cat" are related concepts—they point in similar semantic directions.
Distance measures how far apart vectors are. Nearby vectors represent related concepts. Distant vectors represent unrelated concepts.
Interactive: Explore the vector space
Words cluster by semantic category. Hover to highlight.
This actually works. Train a model on billions of words, and the resulting vector space organizes itself into meaningful structure. Animals cluster together. Countries cluster together. Verbs cluster separately from nouns. Abstract concepts separate from concrete ones. No one programmed this structure. It emerges from the statistics of language use.
Analogies as Directions
The most famous demonstration of semantic geometry is analogy completion. Consider:
This works because relationships between concepts become directions in the vector space. The direction from "man" to "woman" encodes a gender transformation. Apply that same direction to "king" and you arrive near "queen."
Interactive: Explore vector analogies
Click to see how the analogy works as vector arithmetic.
Other relationships work too. Capital cities: Paris - France + Italy ≈ Rome. Verb tenses: walking - walk + swim ≈ swimming. Comparatives: bigger - big + small ≈ smaller. These emerge without explicit training. The model discovers that language encodes regularities, and those regularities become geometric structure.
What Dimensions Mean
A natural question: what do the individual dimensions represent? If a word vector has 300 dimensions, what is dimension 47?
The honest answer: individual dimensions are not interpretable. They do not correspond to human-recognizable concepts like "animal-ness" or "positivity." Instead, each dimension encodes some combination of features that, together with other dimensions, captures meaning.
Think of it like RGB color. You can represent any visible color with three numbers. But the numbers themselves (say, R=127, G=45, B=200) do not tell you "this is purple" in isolation. The meaning emerges from the combination.
The same is true for word vectors. Meaning emerges from the pattern of values across all dimensions, not from any single dimension.
Why This Matters for Search
The connection to search is direct. If we can represent text as vectors where similar meanings are geometrically similar, then search becomes geometry. Embed the query—transform the search query into a vector. Embed the documents—transform each document into vectors. Find nearest neighbors—the documents with vectors closest to the query are the most semantically similar. This is semantic search. The query "how to make my code run faster" becomes a vector. Documents about performance optimization, even if they never use the word "faster," have nearby vectors because they discuss related concepts.
The challenge now becomes: how do we learn these vectors? How do we ensure they capture meaning accurately? And how do we search billions of vectors efficiently? These questions structure the rest of this course.
Key Takeaways
- The distributional hypothesis: words appearing in similar contexts have similar meanings
- Word vectors position words in a mathematical space based on their contexts
- Similar meanings become geometrically nearby; relationships become directions
- Analogies work as vector arithmetic: king - man + woman ≈ queen
- Individual dimensions are not interpretable; meaning emerges from the whole vector
- This geometric representation enables semantic search: find documents with vectors near the query vector