The Brute Force Baseline
Exact nearest neighbors: when it works and when it doesn't
Before exploring sophisticated indexing algorithms, we should understand the baseline: brute force search. Compute the similarity between the query and every vector in the database. Return the top k. Simple, exact, and often surprisingly viable.
The Algorithm
Brute force nearest neighbor search is straightforward:
function search(query, vectors, k):
scores = []
for each vector in vectors:
score = similarity(query, vector)
scores.append((score, vector))
sort scores descending
return top k scoresNo preprocessing. No index structure. Just compute and compare.
Interactive: Brute force complexity
Complexity Analysis
Time complexity: O(n × d) per query
- n = number of vectors
- d = embedding dimension
For 1 million vectors of 768 dimensions, that is 768 million floating-point operations per query. With optimized SIMD instructions, a modern CPU can do billions of operations per second. Rough estimate: tens to hundreds of milliseconds per query.
Space complexity: O(n × d)
- Just storing the vectors
- No additional index overhead
When Brute Force Works
Brute force is the right choice more often than you might think:
When is brute force viable?
Brute force meets requirements. 15ms latency with 1 CPU(s) for 10 QPS.
Small collections (< 10K vectors) At 10,000 vectors, brute force takes single-digit milliseconds. Index overhead is not worth it.
High accuracy requirements Approximate algorithms trade accuracy for speed. When 100% recall is mandatory, brute force is the only guarantee.
Infrequent queries If you run one query per minute, 100ms latency is fine. Index building time would never pay off.
Rapid prototyping Building an index requires choosing parameters, tuning, and evaluating trade-offs. Brute force gets you started immediately.
Filtered queries with small result sets If metadata filters reduce candidates to 1,000 vectors, brute force on that subset is fast.
When Brute Force Fails
Scaling behavior
Brute force scales O(n). HNSW scales O(log n). The gap widens exponentially with collection size.
Large collections (> 100K vectors) At 1 million vectors, queries take hundreds of milliseconds. At 100 million, they take seconds. User experience suffers.
High query throughput 100 queries per second with 100ms each means you need 10 CPUs just for search. Approximate methods are 10-100x faster.
Real-time requirements If P99 latency must be < 20ms, brute force fails at moderate scale.
Cost constraints CPU time costs money. Approximate search on the same hardware handles 10-100x more QPS.
Exact vs Approximate
Exact vs approximate trade-off
Found (95%)
Missed (5%)
At 95% recall, 5 of 100 truly relevant results are missed. Usually acceptable.
Key insight: The items missed are typically those ranked just outside the top-k. If you need the top 10 and the 11th best item replaces the 10th, the difference in relevance is usually negligible.
The fundamental trade-off:
| Approach | Recall | Latency | Build Time |
|---|---|---|---|
| Brute force | 100% | O(n) | 0 |
| HNSW | 95-99% | O(log n) | O(n log n) |
| IVF | 90-98% | O(n/k) | O(n) |
Approximate algorithms miss some true neighbors. But if the top-10 by exact search includes items ranked 1, 2, 3, 4, 5, 6, 7, 8, 10, 12 by approximate search, does it matter? For most applications, no.
Optimizing Brute Force
If you need brute force at scale, optimization helps:
SIMD vectorization: Process multiple dimensions simultaneously. Libraries like FAISS use AVX-512 for 16x throughput on compatible CPUs.
GPU acceleration: GPUs excel at parallel dot products. A single GPU can brute-force millions of vectors in milliseconds.
Batching: Process multiple queries together. Better cache utilization, amortized overhead.
Quantization: Reduce precision from float32 to int8. 4x less memory, faster computation. Small accuracy loss.
Filtering first: Apply metadata filters before distance computation. Fewer vectors to compare.
Brute Force as Reranking
A common pattern: use approximate search to get 100 candidates, then brute force rerank to get the final top 10.
This combines:
- Speed of approximate search for the initial filter
- Accuracy of exact scoring for the final ranking
Approximate methods have highest error for items near the decision boundary. Reranking with exact scores recovers precision where it matters most.
Establishing Your Baseline
Before implementing any indexing, establish a brute force baseline:
- Measure latency: How long do queries take?
- Measure throughput: How many QPS can you handle?
- Define requirements: What latency and recall do you need?
- Evaluate need: Is brute force actually insufficient?
Many projects implement complex indexing that never was necessary. Start simple. Optimize when measurements demand it.
Key Takeaways
- Brute force search computes similarity between the query and every vector—O(n × d) per query
- It is the right choice for small collections, high accuracy needs, or rapid prototyping
- It fails at scale: 100K+ vectors with latency or throughput requirements
- Approximate methods trade small recall losses for 10-100x speedups
- Optimize brute force with SIMD, GPU, batching, or quantization if needed
- Use brute force as a reranking step after approximate retrieval
- Always establish a brute force baseline before building complex indices