AI Glossary

Scaling Law

Empirical relationships showing that model performance improves predictably as a power law with more compute, data, or parameters.

Key Findings

Kaplan et al. (2020) showed that loss decreases as a power law with model size, dataset size, and compute. Chinchilla (2022) showed that most models are undertrained -- optimal training scales data and parameters equally.

Implications

Scaling laws enable predicting the performance of larger models before training them. They guide compute budget allocation. The 'bitter lesson' (Rich Sutton) suggests that scaling general methods consistently outperforms clever engineering.

← Back to AI Glossary

Scaling Law

Key Findings

Implications

Related Articles

LLM Scaling Laws: Bigger Models, Better Performance?

Related Concepts