Test-Time Compute
Using additional computation during inference (not just training) to improve model outputs, through techniques like chain-of-thought, self-reflection, or search.
The Idea
Rather than generating a quick answer, spend more compute at inference time to 'think harder.' This can mean generating multiple candidates and selecting the best, using chain-of-thought reasoning, or iteratively refining outputs.
Techniques
Best-of-N sampling: Generate N responses, score them, return the best. Self-consistency: Sample multiple reasoning chains, take majority answer. Tree search: Explore multiple reasoning paths. Self-reflection: Have the model critique and revise its own output.
Impact
Models like o1 and DeepSeek-R1 use test-time compute to dramatically improve reasoning performance. This represents a shift from 'bigger model' to 'more thinking per problem' as a scaling axis.