AI Glossary

Tensor Parallelism

A distributed training strategy that splits individual layers of a model across multiple GPUs, enabling training of models too large to fit on a single device.

How It Works

Large matrix operations (attention, feedforward layers) are split across GPUs. Each GPU computes a portion of the result, then results are combined via all-reduce communication. This requires fast inter-GPU connections (NVLink).

Usage

Essential for training and serving the largest models (70B+ parameters). Combined with pipeline parallelism and data parallelism in 3D parallelism strategies. Megatron-LM pioneered tensor parallelism for transformers.

← Back to AI Glossary

Last updated: March 5, 2026