ROUGE Score
Recall-Oriented Understudy for Gisting Evaluation -- a set of metrics for evaluating text summarization by measuring overlap between generated and reference summaries.
Variants
ROUGE-1: Unigram overlap. ROUGE-2: Bigram overlap. ROUGE-L: Longest common subsequence. ROUGE-Lsum: Sentence-level LCS for multi-sentence summaries.
Limitations
Like BLEU, ROUGE measures surface-level overlap and misses semantic equivalence (paraphrases score poorly). BERTScore and human evaluation provide more meaningful assessment. Still widely reported as a baseline metric.