NVIDIA H100 SXM5 · Hardware · The Open-Source AI Stack

Specs

Memory: 80 GB HBM3
Bandwidth: 3.35 TB/s
FP16 dense: 989 TFLOPS
FP8 dense: 1979 TFLOPS
Power: 700 W
Form factor: sxm
Interconnect: nvlink
Released: 2022-09

What it runs (8× unit, Q4_K_M, 4K context)

Model	Fits?	Decode ceiling
Llama 3.1 8B Instruct	yes	~4792 tok/s
Llama 3.3 70B Instruct	yes	~594 tok/s
Qwen 2.5 72B Instruct	yes	~577 tok/s
DeepSeek-V3	yes	~1148 tok/s

Ceiling is the theoretical rooflineruntimeA performance model that bounds throughput by either compute or memory bandwidth, whichever is the limiting resource for an operation's arithmetic intensity. Open full entry ; open the explorer to set quant, context, and runtime and see the realistic range.

Sources

NVIDIA H100 Tensor Core GPU datasheet ↗