The Open-Source AI Stack
RSS
All hardware

Hardware · Datacenter

NVIDIA H100 SXM5

NVIDIA

SXM5 board with HBM3. Datasheet tensor figures are published with sparsity; the dense values here are the sparse number halved. Production nodes are typically 8x over NVLink/NVSwitch.

×8 over NVLink Compute units 989 TF FP16 3.35 TB/s memory bus 80 GB HBM3 (VRAM)
Datacenter. Bus width tracks bandwidth (3.35 TB/s, sets decode speed); the box tracks capacity (80 GB, sets what fits). Drawn per unit; a typical node is 8×.

Specs

Memory
80 GB HBM3
Bandwidth
3.35 TB/s
FP16 dense
989 TFLOPS
FP8 dense
1979 TFLOPS
Power
700 W
Form factor
sxm
Interconnect
nvlink
Released
2022-09

What it runs (8× unit, Q4_K_M, 4K context)

Model Fits? Decode ceiling
Llama 3.1 8B Instruct yes ~4792 tok/s
Llama 3.3 70B Instruct yes ~594 tok/s
Qwen 2.5 72B Instruct yes ~577 tok/s
DeepSeek-V3 yes ~1148 tok/s

Ceiling is the theoretical rooflineruntimeA performance model that bounds throughput by either compute or memory bandwidth, whichever is the limiting resource for an operation's arithmetic intensity. Open full entry ; open the explorer to set quant, context, and runtime and see the realistic range.

Sources