Normalization
Techniques that standardize data or neural network activations to a consistent scale, improving training stability and convergence speed.
Data Normalization
Min-max scaling: Scale features to [0, 1]. Standard scaling (z-score): Transform to mean=0, std=1. Robust scaling: Uses median and IQR, resistant to outliers. Applied to input features before training.
Activation Normalization
Batch Normalization: Normalizes across batch dimension. Layer Normalization: Normalizes across feature dimension (used in transformers). RMSNorm: Simplified layer norm without mean centering.
Why It Helps
Prevents features with large values from dominating. Stabilizes gradient flow in deep networks. Reduces sensitivity to learning rate and initialization. Enables faster training with larger learning rates.