AI Glossary

SGD (Stochastic Gradient Descent)

The foundational optimization algorithm that updates model weights using the gradient computed from a single example or mini-batch, introducing beneficial noise into the training process.

Why 'Stochastic'

Unlike full-batch gradient descent which computes the exact gradient, SGD estimates it from a random subset (stochastic). This noise helps escape local minima and acts as implicit regularization.

SGD with Momentum

Adds a 'velocity' term that accumulates past gradients, helping SGD navigate ravines and accelerate in consistent directions. SGD with momentum and learning rate scheduling remains competitive with Adam for many tasks, especially in computer vision.

← Back to AI Glossary

Last updated: March 5, 2026