AMD MI300X / MI325X · The Open-Source AI Stack

AMD's data-center AI accelerator line. MI300X shipped late 2023 with 192GB of HBM3 (more memory than any other accelerator at the time); MI325X refreshed to 256GB HBM3e. Closed hardware, but the surrounding ROCm software stack is open-source-leaning (most of ROCm is publicly licensed, unlike CUDA's source). Compared to NVIDIA H100/H200: MI300X has substantially more memory per accelerator (192GB vs 80GB for H100), so it can hold larger models without sharding. Raw FLOPS are competitive. The gap is on software: ROCm lags CUDA by years on per-framework optimization. vLLM and other open runtimes support AMD, but production deployments still skew NVIDIA. AMD's positioning is "the credible non-NVIDIA option for inference at scale," not "the leader." Production-ready and shipping at scale. Major hyperscaler deployments confirmed (Microsoft Azure, Meta have publicly run MI300X clusters for inference). Hugging Face supports ROCm first-class. The strategic question for AMD is whether ROCm closes the software gap fast enough to break NVIDIA's lock-in before the next generation extends it.

Sources

AMD Instinct MI300X Product Page https://www.amd.com/en/products/accelerators/instinct/mi300/mi300x.html

ROCm Documentation https://rocm.docs.amd.com/

Microsoft Azure ND MI300X Announcement https://azure.microsoft.com/en-us/blog/azure-announces-new-ai-optimized-vm-series-featuring-amds-flagship-mi300x-gpu/

amd.com (audit-verified) https://www.amd.com/en/products/accelerators/instinct/mi300/mi325x.html

techcommunity.microsoft.com (audit-verified) https://techcommunity.microsoft.com/blog/azurehighperformancecomputingblog/introducing-the-new-azure-ai-infrastructure-vm-series-nd-mi300x-v5/4145152

engineering.fb.com (audit-verified) https://engineering.fb.com/2024/10/15/data-infrastructure/metas-open-ai-hardware-vision/

Other projects at the Silicon layer

6 siblings · ordered open first

Tenstorrent (Wormhole, Blackhole) Open source

Open-trending AI accelerators on RISC-V; Jim Keller-led; tt-metal and tt-forge open.

RISC-V Open source

Open instruction set architecture; royalty-free; substrate for open silicon (CPUs and emerging AI accelerators).

NVIDIA H100 / H200 Proprietary

Hyperscaler-class AI accelerator with CUDA software moat; default frontier-training and frontier-inference hardware.

Cerebras CS-3 Proprietary

Wafer-scale accelerator; proprietary but disruptive on inference economics for specific model sizes.

Groq LPU Proprietary

Language Processing Unit; proprietary; extraordinarily fast inference for small-to-medium models at low batch sizes.

Apple Silicon (M-series) Proprietary

Unified memory architecture; closed silicon, but the strongest on-device inference platform via llama.cpp and MLX.