Hardware
Will it fit, and how fast?
Two questions decide whether you can run an open model on a given box. Memory capacity decides what fits. memory bandwidthsiliconThe rate (GB/s or TB/s) at which an accelerator reads its memory. It sets the ceiling on decode tokens/sec, since each token streams the active weights once. Open full entry decides how fast it runs. They are not the same number, and a box that holds a model can still be too slow to serve it.
Pick a model, a quantizationweightsStoring or computing model weights in lower-precision number formats (FP8, INT8, INT4) to reduce memory and bandwidth, accepting small quality loss. Open full entry , and a context length, then compare specific boxes, find every box a model fits on, or pick a box and list everything it can run. The explorer shows whether the model fits, a theoretical rooflineruntimeA performance model that bounds throughput by either compute or memory bandwidth, whichever is the limiting resource for an operation's arithmetic intensity. Open full entry tokens/sec ceiling with the formula, a realistic range, a measured anchor where one exists, and a plain-language reason that traces back to bandwidth, capacity, and KV cacheruntimeThe stored key and value vectors from previously processed tokens, reused at each generation step so an autoregressive model does not recompute attention over the entire prefix. Open full entry . The method comes straight from the self-host course.
Theoretical numbers are labeled as estimates and show their inputs; measured anchors carry a date and source. Every hardware spec flows through the same verification gate as /models: an unverified value renders as —, never a guess.
The spectrum
Per single unit. Compute figures are dense (never sparse). Bandwidth is the decode-speed determinant; capacity is the fit determinant.
Click a column header to sort. Hover a header for what it means.
| Hardware | Class | Memory | Bandwidth | FP16 dense | Power | Released |
|---|---|---|---|---|---|---|
| AMD Instinct MI355X AMD | Datacenter | 288 GB HBM3e | 8.00 TB/s | 2517 TF | 1400 W | 2025-06 |
| NVIDIA B200 NVIDIA | Datacenter | 180 GB HBM3e | 7.70 TB/s | 2250 TF | 1000 W | 2025-01 |
| AMD Instinct MI325X AMD | Datacenter | 256 GB HBM3e | 6.00 TB/s | 1307 TF | 1000 W | 2024-10 |
| AMD Instinct MI300X AMD | Datacenter | 192 GB HBM3 | 5.30 TB/s | 1307 TF | 750 W | 2023-12 |
| NVIDIA H200 SXM NVIDIA | Datacenter | 141 GB HBM3e | 4.80 TB/s | 989 TF | 700 W | 2024-01 |
| NVIDIA H100 SXM5 NVIDIA | Datacenter | 80 GB HBM3 | 3.35 TB/s | 989 TF | 700 W | 2022-09 |
| NVIDIA H100 PCIe NVIDIA | Datacenter | 80 GB HBM2e | 2.00 TB/s | 756 TF | 350 W | 2022-09 |
| NVIDIA GeForce RTX 5090 NVIDIA | Workstation | 32 GB GDDR7 | 1.79 TB/s | 209.5 TF | 575 W | 2025-01 |
| NVIDIA RTX PRO 6000 Blackwell NVIDIA | Workstation | 96 GB GDDR7 | 1.79 TB/s | — | 600 W | 2025-03 |
| NVIDIA GeForce RTX 4090 NVIDIA | Workstation | 24 GB GDDR6X | 1.01 TB/s | 165.2 TF | 450 W | 2022-10 |
| AMD Radeon AI PRO R9700 AMD | Workstation | 32 GB GDDR6 | 640 GB/s | 191 TF | 300 W | 2025-07 |
| Tenstorrent Blackhole p150a Tenstorrent | Workstation | 32 GB GDDR6 | 512 GB/s | — | 300 W | 2025-08 |
| Apple Mac Studio (M3 Ultra) Apple | Apple unified | 512 GB Unified LPDDR5X | 819 GB/s | — | 270 W | 2025-03 |
| Apple MacBook Pro (M5 Max) Apple | Apple unified | 128 GB Unified LPDDR5X | 614 GB/s | — | 40 W | 2026-03 |
| Apple Mac Studio (M4 Max) Apple | Apple unified | 128 GB Unified LPDDR5X | 546 GB/s | — | 160 W | 2025-03 |
| Apple MacBook Pro (M4 Max) Apple | Apple unified | 128 GB Unified LPDDR5X | 546 GB/s | — | 40 W | 2024-10 |
| Apple Mac mini (M4 Pro) Apple | Apple unified | 64 GB Unified LPDDR5X | 273 GB/s | — | 65 W | 2024-10 |
| Apple MacBook Air (M5) Apple | Apple unified | 32 GB Unified LPDDR5X | 153 GB/s | — | 20 W | 2026-03 |
| NVIDIA DGX Spark (GB10) NVIDIA | x86 unified | 128 GB Unified LPDDR5X | 273 GB/s | — | 140 W | 2025-10 |
| AMD Ryzen AI Max+ 395 (Strix Halo) AMD | x86 unified | 128 GB Unified LPDDR5X | 256 GB/s | — | 120 W | 2025-01 |
| Qualcomm Snapdragon X2 Elite Qualcomm | AI PC | 48 GB LPDDR5X | 228 GB/s | — | 80 W | 2025-09 |
| Intel Core Ultra 200V (Lunar Lake) Intel | AI PC | 32 GB LPDDR5X | 136 GB/s | — | 37 W | 2024-09 |
| Qualcomm Snapdragon X Elite Qualcomm | AI PC | 64 GB LPDDR5X | 135 GB/s | — | 80 W | 2024-06 |
Cross-links: /stack/silicon · the self-host course modules on memory math, quantization, and inference engines.