AI Glossary

Perplexity

A metric for evaluating language models that measures how 'surprised' the model is by test data. Lower perplexity means the model predicts the text better.

The Math

Perplexity is the exponentiated average negative log-likelihood per token. A perplexity of 10 means the model is, on average, as uncertain as choosing uniformly among 10 options for the next token.

Interpretation

Lower is better. A perplexity of 1 means perfect prediction. State-of-the-art LLMs achieve perplexities in the single digits on standard benchmarks. Perplexity is the standard intrinsic evaluation metric for language models.

Limitations

Perplexity doesn't directly measure usefulness, safety, or instruction-following ability. A model with low perplexity might still generate harmful or unhelpful content. That's why modern LLM evaluation also uses human preference ratings and task-specific benchmarks.

← Back to AI Glossary

Last updated: March 5, 2026