The Open-Source AI Stack
RSS
All hardware

Hardware · Datacenter

NVIDIA B200

NVIDIA

Blackwell-generation datacenter GPU. Datasheet headlines sparse tensor numbers; dense values here are the sparse figures halved. Usable HBM3e is about 180 GB air-cooled. The GB200 superchip pairs two B200 dies with a Grace CPU; approximate it by selecting two units.

×8 over NVSwitch Compute units 2250 TF FP16 7.70 TB/s memory bus 180 GB HBM3e (VRAM)
Datacenter. Bus width tracks bandwidth (7.70 TB/s, sets decode speed); the box tracks capacity (180 GB, sets what fits). Drawn per unit; a typical node is 8×.

Specs

Memory
180 GB HBM3e
Bandwidth
7.70 TB/s
FP16 dense
2250 TFLOPS
FP8 dense
4500 TFLOPS
Power
1000 W
Form factor
sxm
Interconnect
nvswitch
Released
2025-01

What it runs (8× unit, Q4_K_M, 4K context)

Model Fits? Decode ceiling
Llama 3.1 8B Instruct yes ~11259 tok/s
Llama 3.3 70B Instruct yes ~1396 tok/s
Qwen 2.5 72B Instruct yes ~1356 tok/s
DeepSeek-V3 yes ~2698 tok/s

Ceiling is the theoretical rooflineruntimeA performance model that bounds throughput by either compute or memory bandwidth, whichever is the limiting resource for an operation's arithmetic intensity. Open full entry ; open the explorer to set quant, context, and runtime and see the realistic range.

Sources