AI Glossary

Statistical Significance

A measure of whether observed differences in model performance are likely real or due to random chance.

Overview

Statistical significance in machine learning indicates whether an observed difference in model performance (e.g., Model A achieves 85% accuracy vs Model B's 83%) is likely to reflect a genuine difference rather than random variation from the evaluation data. It's typically assessed using hypothesis tests with p-values (commonly threshold p < 0.05).

Key Details

Common tests include paired t-tests, McNemar's test (for comparing classifiers), bootstrap confidence intervals, and permutation tests. Simply reporting that one model has a higher score is insufficient — without significance testing, apparent improvements may be noise. This is especially important when differences are small, datasets are small, or many model variants are compared (requiring correction for multiple comparisons).

Related Concepts

model evaluationablation studycross validation

← Back to AI Glossary

Last updated: March 5, 2026