The Open-Source AI Stack
RSS
All hardware

Hardware · Datacenter

NVIDIA H100 PCIe

NVIDIA

The PCIe card uses HBM2e and a lower bandwidth and power envelope than the SXM5 part. Dense tensor figures are the datasheet sparse values halved.

Compute units 756 TF FP16 2.00 TB/s memory bus 80 GB HBM2e (VRAM)
Datacenter. Bus width tracks bandwidth (2.00 TB/s, sets decode speed); the box tracks capacity (80 GB, sets what fits).

Specs

Memory
80 GB HBM2e
Bandwidth
2.00 TB/s
FP16 dense
756 TFLOPS
FP8 dense
1513 TFLOPS
Power
350 W
Form factor
pcie
Interconnect
pcie
Released
2022-09

What it runs (single unit, Q4_K_M, 4K context)

Model Fits? Decode ceiling
Llama 3.1 8B Instruct yes ~397 tok/s
Llama 3.3 70B Instruct yes ~49 tok/s
Qwen 2.5 72B Instruct yes ~48 tok/s
DeepSeek-V3 no

Ceiling is the theoretical rooflineruntimeA performance model that bounds throughput by either compute or memory bandwidth, whichever is the limiting resource for an operation's arithmetic intensity. Open full entry ; open the explorer to set quant, context, and runtime and see the realistic range.

Sources