AI Glossary

Class Imbalance

A dataset condition where some classes have significantly more examples than others, causing models to be biased toward the majority class.

The Problem

If 99% of transactions are legitimate and 1% are fraudulent, a model predicting 'legitimate' always achieves 99% accuracy but catches zero fraud. Standard training naturally biases toward the majority class.

Solutions

Oversampling: Duplicate minority examples (SMOTE creates synthetic ones). Undersampling: Remove majority examples. Class weights: Penalize mistakes on minority class more heavily. Threshold adjustment: Lower the decision threshold for the minority class.

Evaluation

Use precision, recall, F1-score, and AUC-ROC instead of accuracy. Look at the confusion matrix. Stratified cross-validation ensures each fold maintains the class distribution.

← Back to AI Glossary