Reasoning Model
An LLM specifically trained or prompted to perform explicit step-by-step reasoning before producing answers.
Overview
Reasoning models are language models optimized for complex reasoning tasks through extended thinking time. Models like OpenAI's o1/o3 and DeepSeek-R1 are trained with reinforcement learning to generate internal chains of reasoning before answering, dramatically improving performance on math, coding, and science problems.
Approaches
Techniques include training with process reward models (evaluating each reasoning step), reinforcement learning for reasoning (GRPO, PPO), test-time compute scaling (spending more tokens thinking), and chain-of-thought fine-tuning. These models trade speed for accuracy, taking longer to respond but achieving significantly better results on complex tasks.