The Open-Source AI Stack
RSS
All hardware

Hardware · Workstation

NVIDIA GeForce RTX 4090

NVIDIA

Ada Lovelace flagship; the 24 GB workstation workhorse. FP16 dense tensor is 165.2 TFLOPS (the often-quoted 330 figure is with sparsity). No native FP8 tensor path.

Compute units 165.2 TF FP16 1.01 TB/s memory bus 24 GB GDDR6X (VRAM)
Workstation. Bus width tracks bandwidth (1.01 TB/s, sets decode speed); the box tracks capacity (24 GB, sets what fits).

Specs

Memory
24 GB GDDR6X
Bandwidth
1.01 TB/s
FP16 dense
165.2 TFLOPS
Power
450 W
Form factor
pcie
Interconnect
pcie
Released
2022-10

What it runs (single unit, Q4_K_M, 4K context)

Model Fits? Decode ceiling
Llama 3.1 8B Instruct yes ~200 tok/s
Llama 3.3 70B Instruct no
Qwen 2.5 72B Instruct no
DeepSeek-V3 no

Ceiling is the theoretical rooflineruntimeA performance model that bounds throughput by either compute or memory bandwidth, whichever is the limiting resource for an operation's arithmetic intensity. Open full entry ; open the explorer to set quant, context, and runtime and see the realistic range.

Sources