What are Embeddings? Vector Representations in AI

The Core Idea: Meaning as Numbers

Computers cannot understand the concept of "king" or "cat" the way humans do. To bridge this gap, AI models learn to represent each piece of data as a list of numbers -- a vector. These vectors are not random. They are carefully trained so that the geometric relationships between vectors mirror the semantic relationships between the concepts they represent.

The classic example: in a well-trained embedding space, the vector for "king" minus "man" plus "woman" produces a vector very close to "queen." The model has captured the analogy king:man::queen:woman purely through numerical geometry.

Why Not One-Hot Encoding?

The naive approach is to assign each word a unique ID (one-hot encoding): "cat" = [0,0,1,0,0,...]. But this creates sparse, enormous vectors with no notion of similarity. The vectors for "cat" and "kitten" are just as different as "cat" and "airplane." Embeddings solve this by compressing meaning into dense, low-dimensional vectors where similar concepts cluster together.

Embeddings in Action

Here is a simplified view of how words map to vectors and how similarity works. Real embeddings have hundreds or thousands of dimensions; we show only a few for clarity.

"cat" → [0.23, -0.41, 0.87, 0.12, -0.65, ...]

"kitten" → [0.25, -0.38, 0.84, 0.15, -0.61, ...] Similarity: 0.97

"car" → [-0.71, 0.55, 0.03, -0.82, 0.19, ...] Similarity: 0.12

"Cat" and "kitten" have very similar vectors (high cosine similarity), reflecting their related meanings. "Car" is far away in vector space, reflecting its unrelated meaning. This is the power of embeddings: meaning becomes measurable distance.

Visualizing Embedding Space

In a 2D projection of embedding space, semantically related words cluster together. Animals form one region, vehicles another, food a third. The actual embedding spaces used by modern models have 768 to 3,072 dimensions, capturing far more nuanced relationships than any 2D visualization can show.

Embedding Models: From Word2Vec to Modern Transformers

The science of embeddings has evolved dramatically. Here are the landmark models that shaped the field.

📚

Word2Vec (2013)

The breakthrough by Google that popularized the idea of word embeddings. Word2Vec trains a shallow neural network on a simple task: predict a word from its neighbors (CBOW) or predict neighbors from a word (Skip-gram). The hidden layer weights become the word vectors.

Limitation: Each word gets exactly one vector regardless of context. "Bank" (financial) and "bank" (river) share the same embedding.

🌎

GloVe (2014)

Stanford's "Global Vectors for Word Representation" takes a different approach: it builds a co-occurrence matrix of the entire corpus and factorizes it. The resulting vectors capture both local (window-based) and global (corpus-wide) statistical patterns.

Strength: Often produces better results for analogy tasks than Word2Vec. Pre-trained GloVe vectors (6B, 42B, 840B tokens) remain widely used as baseline features.

⚡

Contextual Embeddings (BERT, 2018)

BERT revolutionized embeddings by making them context-dependent. The same word gets different vectors depending on its surrounding sentence. "I went to the bank to deposit money" and "I sat on the river bank" produce different vectors for "bank."

How: Uses the Transformer architecture with bidirectional attention. Each token's embedding is a function of the entire input sequence, not just a static lookup.

🚀

Sentence Embeddings (2019+)

Models like Sentence-BERT (SBERT) and Sentence-Transformers are specifically trained to produce high-quality embeddings for entire sentences and paragraphs, not just individual words. They optimize for semantic similarity: sentences with similar meaning get similar vectors.

Impact: Enabled practical semantic search, where you can find documents by meaning rather than keyword matching. This is the foundation of modern RAG systems.

Modern Embedding Models

Today, several providers offer state-of-the-art embedding models optimized for production use. Choosing the right one depends on your use case, language requirements, and budget.

Model / Provider	Dimensions	Key Strength	Best For
OpenAI text-embedding-3-large	3,072	Top accuracy on benchmarks, supports dimension reduction	General-purpose semantic search and RAG
Cohere embed-v3	1,024	Multilingual (100+ languages), compression-friendly	Multilingual search and classification
sentence-transformers (all-MiniLM-L6)	384	Fast, lightweight, runs locally, open-source	On-device or budget-constrained semantic search
Voyage AI voyage-large-2	1,536	Optimized for code and technical content	Code search and documentation retrieval
BGE / E5 (open-source)	768 - 1,024	Competitive accuracy, free to use, self-hostable	Cost-sensitive production deployments

How Embeddings Power Modern AI Applications

Embeddings are not just a theoretical concept. They are the engine behind many of the AI features you use every day.

🔍

Semantic Search

Traditional keyword search fails when users and documents use different words for the same concept. Embedding-based search converts both the query and documents into vectors, then finds documents whose vectors are closest to the query vector. A search for "how to fix a flat tire" matches a document titled "changing a punctured wheel" because their embeddings are similar.

📖

Retrieval-Augmented Generation (RAG)

RAG systems use embeddings to retrieve relevant documents from a knowledge base before passing them to an LLM for answer generation. The user's question is embedded, the most similar document chunks are retrieved via vector search, and the LLM generates an answer grounded in those specific documents. This dramatically reduces hallucinations.

🎬

Recommendation Systems

Streaming services, e-commerce platforms, and social media all use embeddings to match users with content. Both users and items are embedded in the same vector space. Recommendations are generated by finding items whose vectors are closest to the user's preference vector.

📷

Image and Multimodal Search

Models like CLIP embed both images and text into the same vector space. This enables searching images with text queries ("sunset over mountains") and finding visually similar images. The same principle extends to audio, video, and cross-modal retrieval.

🔢

Clustering and Classification

Once data is embedded, standard machine learning techniques can be applied to the vectors. Clustering embeddings reveals natural groupings in text data (topic discovery). Training a simple classifier on top of embeddings often achieves results competitive with much larger fine-tuned models.

🔹

Duplicate and Anomaly Detection

Embeddings make it easy to find near-duplicates in large datasets by comparing vector distances. Support tickets, product listings, and research papers can be deduplicated at scale. Anomaly detection works similarly: items whose vectors are far from all clusters may be outliers or errors.

Measuring Similarity: Cosine Similarity

The most common way to measure how similar two embeddings are is cosine similarity. It measures the angle between two vectors, ignoring their magnitude. A cosine similarity of 1.0 means the vectors point in the exact same direction (identical meaning). A value of 0 means they are perpendicular (unrelated). A value of -1 means they point in opposite directions.

The Formula

Cosine Similarity = (A · B) / (||A|| × ||B||), where A · B is the dot product and ||A|| is the magnitude of vector A. In practice, most embedding models produce normalized vectors, so cosine similarity simplifies to just the dot product, making it extremely fast to compute even at scale with millions of vectors.

What are Embeddings?

The Core Idea: Meaning as Numbers

Why Not One-Hot Encoding?

Embeddings in Action

Visualizing Embedding Space

Embedding Models: From Word2Vec to Modern Transformers

Word2Vec (2013)

GloVe (2014)

Contextual Embeddings (BERT, 2018)

Sentence Embeddings (2019+)

Modern Embedding Models

How Embeddings Power Modern AI Applications

Semantic Search

Retrieval-Augmented Generation (RAG)

Recommendation Systems

Image and Multimodal Search

Clustering and Classification

Duplicate and Anomaly Detection

Measuring Similarity: Cosine Similarity

The Formula

Related Terms

Related Articles

Sentence Embeddings: Measuring Semantic Similarity

Word Embeddings: Word2Vec, GloVe, and the Path to BERT

Choosing the Right Embedding Model for Your RAG System

Topic Modeling: LDA, BERTopic, and Document Clustering

Positional Encoding: How Transformers Understand Word Order

Related Concepts