Mini-Batch
A small subset of the training dataset used for a single parameter update during gradient descent, balancing the efficiency of batch processing with the regularizing effect of noise.
How It Works
The training data is shuffled and divided into mini-batches (e.g., 32 or 64 examples each). The model processes one mini-batch, computes the loss and gradients, updates weights, then moves to the next mini-batch. One pass through all mini-batches is one epoch.
Why Mini-Batches
Full batch gradient descent is slow and memory-intensive. Single-example SGD is too noisy. Mini-batches provide a good balance: efficient GPU utilization, stable gradient estimates, and enough noise for regularization.