Retrieval-Augmented Generation (RAG)
An architecture that enhances LLM responses by first retrieving relevant documents from a knowledge base, then using them as context for generation.
The Pipeline
1. User asks a question. 2. The question is embedded into a vector. 3. A vector database finds the most similar document chunks. 4. Retrieved chunks are inserted into the LLM's prompt as context. 5. The LLM generates an answer grounded in the retrieved information.
Why RAG?
LLMs have knowledge cutoffs and can hallucinate. RAG provides fresh, verifiable information. It's cheaper than fine-tuning, works with any LLM, and the knowledge base can be updated without retraining.
Advanced Techniques
Hybrid search (combining keyword and semantic search), reranking retrieved results, recursive retrieval, query decomposition, and GraphRAG (using knowledge graphs for structured retrieval).