AI Glossary

GPU Cluster

A computing infrastructure consisting of many interconnected GPUs used for training large AI models, the essential hardware for frontier AI development.

Scale

Frontier model training uses clusters of thousands to tens of thousands of GPUs. Meta's training infrastructure exceeds 600,000 GPUs. A single H100 costs ~$30-40K; a 10,000 GPU cluster costs $300-400M+ before networking and operations.

Key Components

GPU nodes (8 GPUs per node), high-bandwidth interconnects (InfiniBand, NVLink), high-speed storage (parallel file systems), cooling infrastructure, and cluster management software.

← Back to AI Glossary

GPU Cluster

Scale

Key Components

Related Articles

GPU Computing for AI: From CUDA to Cloud GPUs

Related Concepts