Long-Context Model
An LLM designed to process very long input sequences, from 128K tokens to over 1 million tokens.
Overview
Long-context models are language models that can process input sequences far longer than traditional limits. Models like Claude (200K tokens), Gemini 1.5 (1M+ tokens), and GPT-4o (128K tokens) can process entire books, codebases, or document collections in a single prompt.
Technical Approaches
Enabling long contexts requires architectural innovations like rotary position embeddings (RoPE) with scaling, flash attention for memory efficiency, ring attention for distributed processing, and efficient KV-cache management. The 'needle in a haystack' test evaluates how well models recall information from different positions in long contexts.