Sampling Strategy
The method used to select the next token during language model text generation, controlling the balance between quality, diversity, and creativity.
Common Strategies
Greedy: Always pick the highest-probability token. Deterministic but repetitive. Temperature sampling: Scale probabilities to control randomness. Top-k: Only consider the k most likely tokens. Top-p (nucleus): Consider tokens covering p cumulative probability.
Combinations
In practice, strategies are combined: temperature + top-p is the most common pairing. Some APIs also support frequency penalty (penalize repeated tokens) and presence penalty (encourage topic diversity).
Task-Specific Choices
Code generation: low temperature (0.0-0.3), high precision needed. Creative writing: higher temperature (0.7-1.0), diversity valued. Factual Q&A: low temperature with grounding. Brainstorming: high temperature.