Weight Initialization
The method used to set initial values of neural network weights before training begins, crucial for training stability and convergence.
Key Methods
Xavier/Glorot: Designed for sigmoid/tanh activations. Scales by 1/sqrt(fan_in). He/Kaiming: Designed for ReLU. Scales by sqrt(2/fan_in). Random normal: Simple but can cause issues in deep networks.
Why It Matters
Poor initialization causes vanishing or exploding gradients. Too small and signals disappear; too large and they explode. Proper initialization ensures gradients and activations remain in a healthy range throughout the network.