Context Window
The maximum amount of text (measured in tokens) that a language model can process in a single input-output interaction.
Why It Matters
The context window determines how much information the model can 'see' at once. A larger context window allows processing longer documents, maintaining longer conversations, and handling more complex tasks that require extensive context.
Evolution of Context Lengths
GPT-3 had a 4K token context. GPT-4 extended to 128K. Claude supports 200K tokens. Google Gemini offers up to 1M+ tokens. Research pushes toward infinite context through techniques like sliding window attention and retrieval augmentation.
Practical Implications
Longer contexts consume more memory and compute (attention is O(n^2) in standard transformers). Techniques like Flash Attention, sparse attention, and RAG help manage the cost. Models also tend to perform worse in the 'middle' of very long contexts (the 'lost in the middle' problem).