AI Glossary

ROUGE Score

Recall-Oriented Understudy for Gisting Evaluation -- a set of metrics for evaluating text summarization by measuring overlap between generated and reference summaries.

Variants

ROUGE-1: Unigram overlap. ROUGE-2: Bigram overlap. ROUGE-L: Longest common subsequence. ROUGE-Lsum: Sentence-level LCS for multi-sentence summaries.

Limitations

Like BLEU, ROUGE measures surface-level overlap and misses semantic equivalence (paraphrases score poorly). BERTScore and human evaluation provide more meaningful assessment. Still widely reported as a baseline metric.

← Back to AI Glossary

Last updated: March 5, 2026