AI Glossary

Batch Size

The number of training examples processed together in one forward/backward pass during neural network training.

Why It Matters

Batch size affects training speed, memory usage, and model quality. Larger batches enable better GPU utilization and more stable gradient estimates. Smaller batches introduce noise that can help generalization.

Common Choices

Typical batch sizes range from 16 to 512 for most tasks. LLM training uses much larger batches (thousands to millions of tokens). The optimal batch size depends on the model, dataset, and available hardware.

Mini-Batch Gradient Descent

In practice, almost all training uses mini-batches -- a compromise between processing one example at a time (stochastic) and the entire dataset at once (batch). This balances computational efficiency with the regularizing effect of gradient noise.

← Back to AI Glossary

Last updated: March 5, 2026