AI Glossary

Model Sharding

Splitting a large model across multiple devices so each device holds only a portion of the parameters.

Overview

Model sharding distributes a large neural network's parameters across multiple GPUs or machines, enabling training and inference of models that are too large to fit on a single device. Different sharding strategies include tensor parallelism (splitting individual layers across devices), pipeline parallelism (assigning different layers to different devices), and expert parallelism (distributing mixture-of-experts across devices).

Key Details

Effective sharding requires careful consideration of communication overhead, load balancing, and memory distribution. Frameworks like Megatron-LM, DeepSpeed, FSDP (Fully Sharded Data Parallel), and JAX's pjit provide automated sharding strategies. Model sharding is essential for training and serving frontier models with hundreds of billions of parameters.

Related Concepts

tensor parallelismpipeline parallelismdistributed training

← Back to AI Glossary

Last updated: March 5, 2026