Top-P (Nucleus) Sampling
A text generation strategy that samples from the smallest set of tokens whose cumulative probability exceeds a threshold p, dynamically adjusting the candidate pool.
How It Works
Sort all possible next tokens by probability. Starting from the most likely, accumulate probabilities until the sum exceeds p. Only sample from this dynamic set. A top-p of 0.9 means sample from the tokens that cover 90% of the probability mass.
Why Top-P Over Top-K
Top-k always considers exactly k tokens regardless of the probability distribution. Top-p adapts: when the model is confident (one token has 95% probability), the candidate set is tiny; when uncertain, it's larger. This produces more natural text.
Practical Settings
Common defaults: top-p = 0.9-0.95 for general use. Combined with temperature for fine-grained control. Top-p = 1.0 disables the filter (sample from all tokens). Top-p = 0.0 would be greedy (always pick the best).