F1 Score
The harmonic mean of precision and recall, balancing both metrics in a single score.
Overview
The F1 score is a classification metric that combines precision and recall into a single number using their harmonic mean: F1 = 2 * (precision * recall) / (precision + recall). It ranges from 0 (worst) to 1 (perfect), and is especially useful when you need to balance false positives and false negatives.
Key Details
The harmonic mean penalizes extreme imbalances — a model with high precision but low recall (or vice versa) will have a low F1 score. Variants include macro F1 (average across classes), micro F1 (global calculation), and weighted F1 (weighted by class frequency). F1 is the standard metric for NLP tasks like named entity recognition, text classification, and information extraction.