Glossary
InfiniBand
A high-throughput, low-latency network fabric (Mellanox, now NVIDIA) used for inter-node communication in AI training clusters, supporting RDMA for direct GPU-to-GPU transfer across machines.
A switched-fabric network technology designed for HPC and now the
dominant inter-node fabric for AI training. Current generations
(NDR, XDR, GDR) deliver 400 to 1600 Gbps per port. The standout
property is RDMA (RDMAcomputeA networking technique that lets a remote machine read or write local memory without involving the CPU, foundational for high-throughput distributed training over InfiniBand or RoCE.
Open full entry ): a GPUsiliconA massively parallel processor originally designed for graphics, repurposed since the 2010s as the dominant compute substrate for both training and inference of large neural networks.
Open full entry on one node
can read or write memory on a GPUsiliconA massively parallel processor originally designed for graphics, repurposed since the 2010s as the dominant compute substrate for both training and inference of large neural networks.
Open full entry on another node without involving
the host CPUs, removing the kernel from the critical path.
InfiniBand is the network behind nearly every announced large-scale training cluster: Meta’s H100 clusters, xAI’s Colossus, OpenAI’s Stargate, frontierweightsThe current capability envelope of AI, defined by the most capable models in deployment at any given time; an evolving label rather than a fixed threshold. Open full entry -lab production fleets. The bandwidth is what makes large-scale data parallel and model parallel viable.
The credible alternative is Ethernet-based RDMAcomputeA networking technique that lets a remote machine read or write local memory without involving the CPU, foundational for high-throughput distributed training over InfiniBand or RoCE. Open full entry (RoCE), which has matured into a real competitor and lets clusters reuse standard Ethernet operations expertise. Whether InfiniBand stays dominant or loses to RoCE on price and openness is one of the more interesting infrastructure questions through 2026.