AI Glossary

Scaling Law

Empirical relationships showing that model performance improves predictably as a power law with more compute, data, or parameters.

Key Findings

Kaplan et al. (2020) showed that loss decreases as a power law with model size, dataset size, and compute. Chinchilla (2022) showed that most models are undertrained -- optimal training scales data and parameters equally.

Implications

Scaling laws enable predicting the performance of larger models before training them. They guide compute budget allocation. The 'bitter lesson' (Rich Sutton) suggests that scaling general methods consistently outperforms clever engineering.

← Back to AI Glossary

Last updated: March 5, 2026