AI Glossary

Batch Processing

Processing multiple data samples simultaneously rather than one at a time, leveraging GPU parallelism for efficient training and inference of AI models.

In Training

Mini-batch gradient descent processes a batch of examples (typically 8-4096) per update. Larger batches provide more stable gradient estimates but require more memory. The batch size affects learning dynamics, generalization, and training speed.

In Inference

Continuous batching groups incoming requests to maximize GPU utilization. Dynamic batching adjusts batch size based on request queue length. This is critical for serving LLMs cost-effectively at scale.

Trade-offs

Larger batches: better GPU utilization, smoother gradients, but higher memory usage and potentially worse generalization. Smaller batches: noisier gradients (can help exploration), less memory, but slower wall-clock time.

← Back to AI Glossary

Batch Processing

In Training

In Inference

Trade-offs

Related Articles

Batch Normalization: Why It Works and How to Use It

Related Concepts