Specs
- Memory
- 180 GB HBM3e
- Bandwidth
- 7.70 TB/s
- FP16 dense
- 2250 TFLOPS
- FP8 dense
- 4500 TFLOPS
- Power
- 1000 W
- Form factor
- sxm
- Interconnect
- nvswitch
- Released
- 2025-01
What it runs (8× unit, Q4_K_M, 4K context)
| Model | Fits? | Decode ceiling |
|---|---|---|
| Llama 3.1 8B Instruct | yes | ~11259 tok/s |
| Llama 3.3 70B Instruct | yes | ~1396 tok/s |
| Qwen 2.5 72B Instruct | yes | ~1356 tok/s |
| DeepSeek-V3 | yes | ~2698 tok/s |
Ceiling is the theoretical rooflineruntimeA performance model that bounds throughput by either compute or memory bandwidth, whichever is the limiting resource for an operation's arithmetic intensity. Open full entry ; open the explorer to set quant, context, and runtime and see the realistic range.