Traditional search engines match keywords. When you search for "comfortable running shoes," they look for documents containing those exact words. Vector search engines take a fundamentally different approach: they understand meaning. A vector search for "comfortable running shoes" would also return results about "cushioned jogging footwear" because the concepts are semantically similar, even though they share no words. This capability, powered by embeddings and approximate nearest neighbor algorithms, has become the foundation of modern AI-powered search and retrieval-augmented generation.
From Words to Vectors
At the heart of vector search is the embedding: a numerical representation of text, images, or other data as a dense vector (array of numbers) in high-dimensional space. Modern embedding models, trained on massive text corpora, learn to place semantically similar items close together in this vector space. The sentence "The cat sat on the mat" and "A feline rested on the rug" would have very similar embeddings despite sharing few words.
Embedding models like OpenAI's text-embedding-3, Cohere's Embed, and open-source models like E5 and BGE produce vectors with hundreds to thousands of dimensions. The distance between two vectors in this space serves as a measure of semantic similarity.
Distance Metrics
- Cosine Similarity: Measures the angle between vectors, ignoring magnitude. Most common for text embeddings because it captures directional similarity
- Euclidean Distance (L2): Measures straight-line distance between points. Works well when vector magnitudes are meaningful
- Dot Product: Combines direction and magnitude. Often used when embeddings are normalized
The Challenge of Scale
Computing the distance between a query vector and every vector in a database is straightforward but prohibitively slow at scale. With a billion vectors of 1536 dimensions, a brute-force search would require comparing the query to every single vector. This is where Approximate Nearest Neighbor (ANN) algorithms come in, trading a small amount of accuracy for enormous speed improvements.
HNSW (Hierarchical Navigable Small World)
HNSW builds a multi-layered graph where each layer connects vectors to their approximate neighbors. Search begins at the top layer (sparse, for coarse navigation) and descends through increasingly dense layers for finer search. HNSW provides excellent recall (typically 95-99%) with sub-millisecond query times on millions of vectors. It is the default index type in most vector databases.
IVF (Inverted File Index)
IVF partitions the vector space into clusters using k-means, then searches only the nearest clusters. This dramatically reduces the search space. IVF-PQ combines clustering with product quantization (compressing vectors to use less memory), enabling billion-scale search on commodity hardware.
"The art of vector search is choosing the right tradeoff between recall, latency, and memory. HNSW favors recall and speed at the cost of memory. IVF-PQ favors memory efficiency at the cost of some recall."
Key Takeaway
Vector search combines embedding models that capture meaning with ANN algorithms that enable fast retrieval. The choice of embedding model determines search quality, while the choice of index determines speed and resource requirements.
Vector Databases Compared
Pinecone
Pinecone is a fully managed vector database that handles infrastructure, scaling, and index optimization automatically. It offers excellent developer experience with simple APIs and supports metadata filtering alongside vector search. Pinecone is ideal for teams that want to focus on application logic rather than infrastructure management.
Weaviate
Weaviate is an open-source vector database that can run self-hosted or as a managed service. Its distinguishing feature is built-in vectorization: you can insert raw text or images and Weaviate generates embeddings automatically using configured model providers. Weaviate also supports hybrid search combining vector similarity with keyword matching.
Milvus and Zilliz
Milvus is an open-source vector database designed for massive scale, supporting billions of vectors with distributed architecture. Zilliz Cloud provides a managed version. Milvus offers the most index types and fine-grained tuning options, making it suitable for teams with specific performance requirements.
Qdrant
Qdrant is written in Rust for performance and offers strong filtering capabilities alongside vector search. Its payload-based filtering is particularly efficient, making it well-suited for applications that need to combine semantic similarity with metadata constraints.
pgvector
For teams that want vector search without a separate database, pgvector adds vector similarity search to PostgreSQL. While it does not match the performance of dedicated vector databases at large scale, pgvector is an excellent choice for applications with fewer than a few million vectors where the operational simplicity of a single database is valuable.
Real-World Applications
Retrieval-Augmented Generation (RAG)
The most prominent application of vector search today is RAG, where a large language model's responses are grounded in retrieved documents. When a user asks a question, the system embeds the query, searches a vector database for relevant documents, and includes those documents in the LLM's context. This dramatically reduces hallucination and enables LLMs to answer questions about private or recent data.
Recommendation Systems
Embed users and items into the same vector space, then find items closest to a user's embedding for personalized recommendations. This approach captures nuanced preferences that collaborative filtering might miss.
Image and Multimodal Search
Models like CLIP produce embeddings that place images and text in the same vector space. This enables searching for images using natural language descriptions, or finding visually similar images, without any manual tagging.
Best Practices
- Choose embeddings carefully: The embedding model determines search quality. Test multiple models on your specific domain and evaluate with real user queries
- Chunk documents thoughtfully: For text search, how you split documents into chunks significantly affects retrieval quality. Experiment with chunk sizes and overlap
- Combine vector and keyword search: Hybrid search (combining semantic similarity with BM25 keyword matching) often outperforms either approach alone
- Filter before or during search: Use metadata filtering to narrow the search space when applicable, improving both relevance and performance
Vector search has rapidly matured from a research technique to production infrastructure. As AI applications increasingly rely on retrieval, understanding how semantic search works and how to optimize it is essential for building effective AI systems.
Key Takeaway
Vector search engines enable AI applications to find information by meaning rather than keywords. Choose your embedding model for quality, your vector database for operational needs, and always evaluate with real queries from your domain.
