Distance Metrics

Euclidean, Manhattan, and why the choice matters for retrieval

Similarity measures how alike two vectors are. Distance measures how far apart they are. Both serve the same purpose—ranking candidates—but distance operates in reverse: lower is better.

Understanding the geometry of different distance metrics helps you choose the right one and interpret results correctly.

Euclidean Distance (L2)

Euclidean distance is the straight-line distance between two points—what you would measure with a ruler.

d(a,b)=i=1n(aibi)2d(\vec{a}, \vec{b}) = \sqrt{\sum_{i=1}^{n} (a_i - b_i)^2}

In 2D, this is the Pythagorean theorem. In higher dimensions, the same formula applies: sum the squared differences, take the square root.

Interactive: Euclidean distance

d = √[(x₂-x₁)² + (y₂-y₁)²]
d = √[(3.0-0)² + (2.0-0)²] =3.606

Properties:

  • Always non-negative
  • Zero only when vectors are identical
  • Symmetric: d(a, b) = d(b, a)
  • Satisfies triangle inequality: d(a, c) ≤ d(a, b) + d(b, c)

For normalized vectors: Euclidean distance relates directly to cosine similarity: d(a,b)=22cos(θ)d(\vec{a}, \vec{b}) = \sqrt{2 - 2\cos(\theta)}

This means ranking by Euclidean distance on normalized vectors gives the same order as cosine similarity.

Manhattan Distance (L1)

Manhattan distance sums the absolute differences per dimension—the distance if you could only travel along axis-aligned paths (like a taxi in Manhattan).

d(a,b)=i=1naibid(\vec{a}, \vec{b}) = \sum_{i=1}^{n} |a_i - b_i|

Interactive: Manhattan distance

Manhattan (L1)
5.00
|Δx| + |Δy|
Euclidean (L2)
3.61
√(Δx² + Δy²)

Purple path: Manhattan distance (along axes). Orange dashed: Euclidean (straight line).

Properties:

  • Also called L1 distance, taxicab distance, or city block distance
  • More robust to outliers than Euclidean
  • Treats all dimensions equally regardless of correlation

Manhattan distance is less commonly used in semantic search but has applications when individual feature differences matter independently.

Lp Distances

Euclidean and Manhattan are special cases of the Lp norm:

dp(a,b)=(i=1naibip)1/pd_p(\vec{a}, \vec{b}) = \left(\sum_{i=1}^{n} |a_i - b_i|^p\right)^{1/p}

  • L1 (p=1): Manhattan distance
  • L2 (p=2): Euclidean distance
  • L∞ (p=∞): Maximum difference across dimensions

Higher p emphasizes the largest differences. Lower p treats all differences more equally.

Comparing Metrics

Interactive: Compare distance metrics

PointL2L1L∞
A2.243.002.00
B2.243.002.00
C2.553.002.50
L2 Rank
1.A2.B3.C
L1 Rank
1.A2.B3.C
L∞ Rank
1.A2.B3.C

Consider two points in 2D:

  • Point A: (0, 0)
  • Point B: (3, 4)
MetricDistance
Euclidean (L2)5
Manhattan (L1)7
L∞ (Chebyshev)4

Different metrics give different distances—but more importantly, they can give different rankings when comparing multiple candidates.

Squared Euclidean Distance

Computing square roots is expensive. Since we only need rankings, we often use squared Euclidean distance:

d2(a,b)=i=1n(aibi)2d^2(\vec{a}, \vec{b}) = \sum_{i=1}^{n} (a_i - b_i)^2

This preserves ranking order (if d(a) < d(b), then d²(a) < d²(b)) while avoiding the square root computation.

Most vector databases use squared Euclidean distance internally, even if they report it as "Euclidean" in the API.

Distance and Similarity Relationship

Converting between distance and similarity

Similarity (higher = more similar)

Cosine0.800
Dot Product0.800

Distance (lower = more similar)

Cosine Dist0.200
Euclidean (norm)0.632

Conversion Formulas (normalized vectors)

cosine_dist = 1 - cosine_sim
euclidean = √(2 - 2 × cosine_sim)

For normalized vectors, there are clean conversions:

SimilarityDistanceRelationship
Cosine simCosine distdist = 1 - sim
Dot product-Same as cosine for normalized
-Euclideandist² = 2(1 - cosine)

When your algorithm requires distances but you want cosine-like behavior, use Euclidean distance on normalized vectors.

Which Metric to Choose?

For semantic search: Use cosine similarity (or equivalently, Euclidean on normalized vectors). This is what embedding models are trained for.

For image embeddings: Often L2 distance, but check model documentation.

For sparse vectors: L1 or cosine, depending on the embedding type.

For user/item embeddings: Dot product often works, as magnitude may carry meaning.

The choice depends on how the embeddings were trained. Most text embedding models optimize for cosine similarity, so use that metric.

Metric Spaces and Indexing

The choice of metric affects which indexing algorithms apply:

HNSW works with any metric satisfying the triangle inequality (L1, L2, cosine).

IVF (Inverted File) uses any metric for clustering and search.

LSH (Locality Sensitive Hashing) requires metric-specific hash functions. Random projection LSH assumes L2 or cosine.

Vector databases typically support multiple metrics. Choose at index creation time—changing later requires rebuilding.

Key Takeaways

  • Euclidean (L2) distance measures straight-line distance; it is sensitive to large differences
  • Manhattan (L1) distance sums absolute differences; it is more robust to outliers
  • Squared Euclidean distance preserves rankings while avoiding expensive square roots
  • For normalized vectors, Euclidean distance and cosine similarity give the same ranking
  • Choose the metric your embedding model was trained for—typically cosine for text
  • Metric choice affects which indexing algorithms can be used