AI Glossary

Prefix Caching

Reusing computed KV cache from shared prompt prefixes across multiple LLM requests.

Overview

Prefix caching is an LLM serving optimization that stores and reuses the key-value (KV) cache computed for common prompt prefixes. When multiple requests share the same system prompt or context (which is common in production), the KV cache for the shared prefix is computed once and reused, avoiding redundant computation.

Key Details

This can reduce time-to-first-token by 50-90% for requests with long shared prefixes (like system prompts or few-shot examples). Cloud providers like Anthropic and OpenAI offer prefix caching as a feature. It's particularly valuable for applications with consistent system prompts, RAG systems with cached document contexts, and batch processing of similar queries.

Related Concepts

kv cache • inference optimization • continuous batching

← Back to AI Glossary