The Open-Source AI Stack
RSS
All hardware

Hardware · Workstation

NVIDIA RTX PRO 6000 Blackwell

NVIDIA

Workstation Blackwell with 96 GB GDDR7: the most local model capacity on a single card with workstation-class bandwidth. Compute TFLOPS are omitted because NVIDIA publishes only an effective-FP4-with-sparsity headline. Workstation and Server editions run 600 W at 1792 GB/s; Max-Q is 300 W with identical memory, so decode behavior matches.

Compute units 1.79 TB/s memory bus 96 GB GDDR7 (VRAM)
Workstation. Bus width tracks bandwidth (1.79 TB/s, sets decode speed); the box tracks capacity (96 GB, sets what fits).

Specs

Memory
96 GB GDDR7
Bandwidth
1.79 TB/s
Power
600 W
Form factor
pcie
Interconnect
pcie
Released
2025-03

What it runs (single unit, Q4_K_M, 4K context)

Model Fits? Decode ceiling
Llama 3.1 8B Instruct yes ~356 tok/s
Llama 3.3 70B Instruct yes ~44 tok/s
Qwen 2.5 72B Instruct yes ~43 tok/s
DeepSeek-V3 no

Ceiling is the theoretical rooflineruntimeA performance model that bounds throughput by either compute or memory bandwidth, whichever is the limiting resource for an operation's arithmetic intensity. Open full entry ; open the explorer to set quant, context, and runtime and see the realistic range.

Sources