AI Glossary

Vanishing Gradient Problem

A training difficulty where gradients become extremely small as they propagate backward through many layers, preventing early layers from learning effectively.

Causes

When using activation functions like sigmoid or tanh, gradients are multiplied by values less than 1 at each layer. After many layers, the gradient approaches zero, so early layers receive essentially no learning signal.

Solutions

ReLU activation (gradient is 1 for positive inputs). Skip/residual connections. Batch/layer normalization. Better weight initialization (He, Xavier). LSTM gates (for sequence models). These solutions enabled training networks with hundreds of layers.

← Back to AI Glossary

Vanishing Gradient Problem

Causes

Solutions

Related Articles

The Vanishing Gradient Problem and How to Solve It

Gradient Boosting and XGBoost: Winning Machine Learning Competitions

RNNs, LSTMs, and GRUs: Processing Sequential Data

Activation Functions: ReLU, Sigmoid, Tanh, and More

Backpropagation: How Neural Networks Actually Learn

Related Concepts