The Open-Source AI Stack
RSS
All hardware

Hardware · Workstation

Tenstorrent Blackhole p150a

Tenstorrent

Open-stack accelerator (RISC-V cores plus Tensix units); the whole software stack is open source. The compute figure is Tenstorrent's quoted BLOCKFP8 (block-scaled FP8), not standard E4M3/E5M2, so cross-vendor FP8 comparisons are not exact.

Compute units 512 GB/s memory bus 32 GB GDDR6 (VRAM)
Workstation. Bus width tracks bandwidth (512 GB/s, sets decode speed); the box tracks capacity (32 GB, sets what fits).

Specs

Memory
32 GB GDDR6
Bandwidth
512 GB/s
FP8 dense
664 TFLOPS
Power
300 W
Form factor
pcie
Interconnect
pcie
Released
2025-08

What it runs (single unit, Q4_K_M, 4K context)

Model Fits? Decode ceiling
Llama 3.1 8B Instruct yes ~102 tok/s
Llama 3.3 70B Instruct no
Qwen 2.5 72B Instruct no
DeepSeek-V3 no

Ceiling is the theoretical rooflineruntimeA performance model that bounds throughput by either compute or memory bandwidth, whichever is the limiting resource for an operation's arithmetic intensity. Open full entry ; open the explorer to set quant, context, and runtime and see the realistic range.

Sources