AI Glossary

Batch Processing

Processing multiple data samples simultaneously rather than one at a time, leveraging GPU parallelism for efficient training and inference of AI models.

In Training

Mini-batch gradient descent processes a batch of examples (typically 8-4096) per update. Larger batches provide more stable gradient estimates but require more memory. The batch size affects learning dynamics, generalization, and training speed.

In Inference

Continuous batching groups incoming requests to maximize GPU utilization. Dynamic batching adjusts batch size based on request queue length. This is critical for serving LLMs cost-effectively at scale.

Trade-offs

Larger batches: better GPU utilization, smoother gradients, but higher memory usage and potentially worse generalization. Smaller batches: noisier gradients (can help exploration), less memory, but slower wall-clock time.

← Back to AI Glossary

Last updated: March 5, 2026