Dot Product and Cosine Similarity
Two sides of the same coin: when magnitude matters and when it doesn't
Finding similar vectors requires measuring similarity. Two operations dominate: the dot product and cosine similarity. They are closely related, yet subtly different. Understanding when to use each is essential for effective retrieval.
The Dot Product
The dot product of two vectors sums the element-wise products:
Geometrically, it measures how much two vectors "agree" in each dimension.
Interactive: Explore the dot product
a · b = |a| × |b| × cos(θ). The dot product depends on both magnitudes and the angle.
The dot product has a geometric interpretation:
Where is the angle between vectors. This formula reveals that the dot product depends on three things: the magnitude of , the magnitude of , and the angle between them.
This is both useful and problematic. Longer vectors naturally produce larger dot products, regardless of direction.
Cosine Similarity
Cosine similarity isolates the directional component by normalizing:
The result ranges from -1 (opposite directions) through 0 (perpendicular) to +1 (same direction).
Interactive: Cosine similarity
By dividing out the magnitudes, cosine similarity measures only how aligned two vectors are. A short vector pointing in the same direction as a long vector has cosine similarity 1.
This is usually what we want for semantic search: two texts should be similar if they point in the same semantic direction, regardless of how "much" of that direction they express.
When Magnitude Matters
Sometimes magnitude carries meaning. Consider two document embeddings: a long, detailed article about machine learning and a short tweet about machine learning. Both point in similar directions—they are "about" the same topic. But the long document has more content, more information, potentially more relevance to a detailed query.
If embeddings encode magnitude as "amount of information," dot product preserves this signal. Cosine similarity discards it.
Normalization effects
| Document | Dot Product | Cosine Sim |
|---|---|---|
| Doc A (long) | 5.63 | 1.000 |
| Doc B (short) | 2.25 | 1.000 |
| Doc C (medium) | 3.75 | 1.000 |
All vectors point in similar directions (same cosine). But dot products differ due to magnitude.
In practice, most embedding models output normalized vectors (length 1). This makes dot product and cosine similarity equivalent:
Computing Similarity at Scale
The choice of metric affects more than semantics—it affects computation.
Dot product is faster. It is a single SIMD-friendly operation: multiply and add. Vector databases optimize heavily for this.
Cosine similarity requires computing norms and dividing. If vectors are pre-normalized, this cost is paid once at indexing time, not at query time.
Best practice: Normalize vectors when inserting into the database. Then use dot product for search, getting cosine similarity semantics with dot product speed.
Inner Product vs Cosine in Vector Databases
Vector databases typically offer multiple metrics. The "cosine" metric normalizes vectors then computes the dot product. The "dot product" or "inner product" metric uses the raw dot product without normalization. The "euclidean" or "L2" metric measures distance rather than similarity.
Comparing similarity metrics
| vs Query | Dot Product | Cosine | Euclidean |
|---|---|---|---|
| Doc 1 | 0.990 | 0.991 | 0.132 |
| Doc 2 | 0.360 | 0.356 | 1.140 |
| Doc 3 | -0.860 | -0.922 | 1.895 |
Dot Product Rank
- 1.Doc 1
- 2.Doc 2
- 3.Doc 3
Cosine Rank
- 1.Doc 1
- 2.Doc 2
- 3.Doc 3
Euclidean Rank
- 1.Doc 1
- 2.Doc 2
- 3.Doc 3
Different metrics can produce different rankings. For normalized vectors, cosine and Euclidean give the same order.
For retrieval with normalized embeddings, use "cosine" if the database normalizes for you, or use "dot product" if you pre-normalize yourself—the latter is faster but gives the same results. For retrieval with unnormalized embeddings, "cosine" handles varying magnitudes gracefully while "dot product" may give unexpected results if magnitudes vary widely.
Similarity vs Distance
Cosine similarity and dot product measure similarity—higher is more similar.
Some algorithms require distance—lower is more similar. Cosine distance is simply , ranging from 0 for identical vectors to 2 for opposite vectors. Angular distance is , ranging from 0 to 1—more intuitive but slower to compute due to the arc cosine.
For normalized vectors, Euclidean distance relates directly to cosine similarity:
This means Euclidean distance is monotonically related to cosine similarity for unit vectors. Ranking by either gives the same order.
Practical Recommendations
Start with normalized embeddings. Most models output them by default; check that yours does. With normalized vectors, use dot product search—it gives cosine semantics at lower computational cost.
Understand your specific model. Some models like ColBERT use non-normalized vectors intentionally to encode additional information in magnitude. Check the documentation rather than assuming.
Think carefully about thresholds. A cosine similarity of 0.8 means different things for different models. One model might rarely exceed 0.7 for any pair; another might cluster similar documents around 0.95. Calibrate empirically on real data, not intuitively.
Consider the distribution of values. Some embedding spaces use the full [-1, 1] range, including negative similarities. Others cluster all vectors in a narrow band like [0.3, 0.9]. Knowing your embedding distribution helps set meaningful thresholds and detect anomalies.
Key Takeaways
- Dot product sums element-wise products; it depends on both direction and magnitude
- Cosine similarity normalizes out magnitude, measuring only directional alignment
- For normalized vectors, dot product equals cosine similarity
- Pre-normalize vectors and use dot product for the best of both: cosine semantics, maximum speed
- Vector databases offer multiple metrics; choose based on whether your vectors are normalized
- Similarity thresholds are model-specific; calibrate on real data