Batch Processing
Processing multiple data samples simultaneously rather than one at a time, leveraging GPU parallelism for efficient training and inference of AI models.
In Training
Mini-batch gradient descent processes a batch of examples (typically 8-4096) per update. Larger batches provide more stable gradient estimates but require more memory. The batch size affects learning dynamics, generalization, and training speed.
In Inference
Continuous batching groups incoming requests to maximize GPU utilization. Dynamic batching adjusts batch size based on request queue length. This is critical for serving LLMs cost-effectively at scale.
Trade-offs
Larger batches: better GPU utilization, smoother gradients, but higher memory usage and potentially worse generalization. Smaller batches: noisier gradients (can help exploration), less memory, but slower wall-clock time.