AI Glossary

Information Retrieval

The field of finding relevant documents or passages from a large collection in response to a user query, fundamental to search engines and RAG systems.

Traditional Methods

TF-IDF: Scores documents by term frequency and inverse document frequency. BM25: The standard keyword matching algorithm, used by Elasticsearch. These are fast and effective for exact term matching.

Neural Retrieval

Dense retrieval: Encode queries and documents as vectors, find nearest neighbors. Cross-encoders: Score query-document pairs together for higher accuracy (but slower). Hybrid: Combine BM25 with dense retrieval.

Modern Pipeline

Retrieve candidates with fast methods (BM25 + dense retrieval), rerank top-k with a cross-encoder, then pass to an LLM for answer generation. This retrieve-rerank-generate pipeline is the backbone of production RAG systems.

← Back to AI Glossary

Last updated: March 5, 2026