Mixed-Precision Training
Training neural networks using both 16-bit and 32-bit floating-point numbers to reduce memory and increase speed.
Overview
Mixed-precision training uses a combination of FP16 (half-precision) and FP32 (full-precision) floating-point arithmetic during neural network training. Most operations use FP16 for speed and memory savings, while a master copy of weights is maintained in FP32 for numerical stability during parameter updates.
Key Details
This technique typically provides 2-3x speedups and halves memory usage with minimal accuracy loss. Loss scaling prevents FP16 underflow by multiplying the loss by a large factor before backpropagation. Mixed precision is standard practice for training large models on modern GPUs (NVIDIA Tensor Cores are optimized for it) and is supported by frameworks like PyTorch AMP and TensorFlow mixed precision.