Loss Landscape
The multi-dimensional surface formed by the loss function across all possible model parameter values, which optimization algorithms navigate to find good solutions.
Visualizing the Landscape
Imagine a hilly terrain where elevation represents loss and position represents parameter values. Training (gradient descent) is like rolling a ball downhill to find a valley (local minimum). In reality, the landscape has millions of dimensions.
Key Features
Local minima: Valleys that aren't the lowest point. Saddle points: Points that are minima in some dimensions but maxima in others. Flat regions: Plateaus where gradients are near zero. Modern research suggests most local minima in large networks are nearly as good as the global minimum.
Implications for Training
The loss landscape shape affects which optimizer works best, what learning rate to use, and whether training converges. Wider minima tend to generalize better than sharp minima. Techniques like learning rate warmup and cyclical learning rates help navigate the landscape.