The Open-Source AI Stack
RSS

Glossary

NVLink

NVIDIA's proprietary GPU-to-GPU interconnect, providing bandwidth an order of magnitude above PCIe and the basis for tightly-coupled 8-GPU server nodes (DGX, HGX).

A high-bandwidth, point-to-point interconnect between NVIDIA GPUs in the same server. Hopper’s NVLink 4 reaches 900 GB/s aggregate per GPUsiliconA massively parallel processor originally designed for graphics, repurposed since the 2010s as the dominant compute substrate for both training and inference of large neural networks. Open full entry ; Blackwell’s NVLink 5 reaches 1.8 TB/s. Combined with NVSwitch, which provides all-to-all connectivity across 8 (or with NVL72, 72) GPUs in one fabric, NVLink turns a server into a tightly-coupled compute domain rather than a loosely-connected cluster.

The relevance for AI: tensor parallelism and pipeline parallelism require fast all-reduce across the participating GPUs. PCIe at 64 GB/s per GPUsiliconA massively parallel processor originally designed for graphics, repurposed since the 2010s as the dominant compute substrate for both training and inference of large neural networks. Open full entry is the bottleneck without NVLink; on NVLink it disappears into the noise of the matmul itself. Multi-GPU training and large- model inferenceruntimeRunning a trained model to produce outputs (tokens, images, embeddings) from inputs at serving time, as distinct from the gradient updates of training. Open full entry depend on NVLink-class interconnect to scale.

Competing fabrics exist (AMD Infinity Fabric, custom hyperscaler interconnects) but NVLink has the deepest software ecosystem. The new NVL72 architecture for Blackwell extends the NVLink domain to 72 GPUs in a single rack, blurring the line between “node” and “cluster” for the largest models.

Sources

Mentioned in

Back to glossary