AI Glossary

Weight Initialization

The method used to set initial values of neural network weights before training begins, crucial for training stability and convergence.

Key Methods

Xavier/Glorot: Designed for sigmoid/tanh activations. Scales by 1/sqrt(fan_in). He/Kaiming: Designed for ReLU. Scales by sqrt(2/fan_in). Random normal: Simple but can cause issues in deep networks.

Why It Matters

Poor initialization causes vanishing or exploding gradients. Too small and signals disappear; too large and they explode. Proper initialization ensures gradients and activations remain in a healthy range throughout the network.

← Back to AI Glossary

Last updated: March 5, 2026