Neural Scaling Laws
Empirical laws describing how model performance improves predictably with increased compute, data, and parameters.
Overview
Neural scaling laws are empirical relationships showing that model performance (measured by loss) improves as a smooth power law with increases in model parameters, training data, and compute budget. First formalized by Kaplan et al. (2020) at OpenAI and refined by Hoffmann et al. (2022) with the Chinchilla scaling laws.
Implications
Scaling laws enable organizations to predict model performance before training, optimize the allocation of compute between model size and data (Chinchilla-optimal ratio), and estimate the cost of achieving target performance levels. However, scaling laws describe loss, not specific capabilities — emergent abilities may not follow smooth scaling. Current research explores scaling laws for inference compute (thinking tokens) in addition to training compute.