NVIDIA H100 / H200 · The Open-Source AI Stack

The H100 (Hopper) and H200 are NVIDIA's data-center AI accelerators. Closed silicon, closed ISA (PTX as the published intermediate, SASS as the actual hardware ISA), tightly bound to the CUDA software ecosystem. The H100 ships with 80GB of HBM3 memory; the H200 was refreshed with 141GB of HBM3e and higher bandwidth. Why it dominates: not just the hardware. The CUDA software stack (cuDNN, cuBLAS, NCCL, TensorRT-LLM) has compounded developer attention for fifteen years. Every framework, every inference engine, every research codebase assumes CUDA as the default backend. The competing accelerators (AMD MI300X, Tenstorrent, Intel Gaudi) lag by years on per-framework optimization not because the hardware is worse on paper, but because the kernels have not been written. This is the lock-in pattern named in the silicon-defeats-open-weights argument. Production-ready as the de facto frontier accelerator. The vast majority of frontier training runs in 2024-2026 used H100s or H200s; subsequent Blackwell-series accelerators (B100/B200) are shipping but H100/H200 remain the workhorse fleet. Available via every major hyperscaler and dedicated AI clouds (CoreWeave, Lambda, etc.). Street price typically $25-40K for an H100; not sold to individuals at scale.

Sources

NVIDIA H100 Product Page https://www.nvidia.com/en-us/data-center/h100/

NVIDIA H200 Product Page https://www.nvidia.com/en-us/data-center/h200/

CUDA Toolkit Documentation https://docs.nvidia.com/cuda/

intuitionlabs.ai (audit-verified) https://intuitionlabs.ai/articles/nvidia-ai-gpu-pricing-guide

markets.financialcontent.com (audit-verified) https://markets.financialcontent.com/wral/article/tokenring-2025-12-29-nvidias-blackwell-dynasty-b200-and-gb200-sold-out-through-mid-2026-as-backlog-hits-36-million-units

Other projects at the Silicon layer

6 siblings · ordered open first

Tenstorrent (Wormhole, Blackhole) Open source

Open-trending AI accelerators on RISC-V; Jim Keller-led; tt-metal and tt-forge open.

RISC-V Open source

Open instruction set architecture; royalty-free; substrate for open silicon (CPUs and emerging AI accelerators).

AMD MI300X / MI325X Proprietary

Highest-memory accelerator on the market (192 GB+ HBM); ROCm software stack open-source-adjacent.

Cerebras CS-3 Proprietary

Wafer-scale accelerator; proprietary but disruptive on inference economics for specific model sizes.

Groq LPU Proprietary

Language Processing Unit; proprietary; extraordinarily fast inference for small-to-medium models at low batch sizes.

Apple Silicon (M-series) Proprietary

Unified memory architecture; closed silicon, but the strongest on-device inference platform via llama.cpp and MLX.