AI Glossary

Batch Normalization

A technique that normalizes layer inputs by adjusting and scaling activations across a mini-batch, stabilizing and accelerating neural network training.

How It Works

For each mini-batch, compute the mean and variance of activations, normalize to zero mean and unit variance, then apply learnable scale and shift parameters. This is done independently for each feature/channel.

Benefits

Enables higher learning rates (faster training), reduces sensitivity to weight initialization, acts as a mild regularizer (reducing overfitting), and helps mitigate the vanishing/exploding gradient problem.

Alternatives

Layer Normalization: Normalizes across features instead of batch. Used in transformers because it works with variable batch sizes and sequence lengths. RMSNorm: A simplified layer norm used in LLaMA and modern LLMs.

← Back to AI Glossary

Batch Normalization

How It Works

Benefits

Alternatives

Related Articles

Batch Normalization: Why It Works and How to Use It

The Vanishing Gradient Problem and How to Solve It

Backpropagation: How Neural Networks Actually Learn

Related Concepts