Glossary

AI factory

A purpose-built data center optimized for AI training rather than general cloud workloads, characterized by liquid-cooled high-density GPU racks, gigawatt-scale single-tenant power, and tightly-coupled networking.

Infrastructure also: Compute aka AI factory, purpose-built training facility

A data center designed from the ground up for AI training, not for mixed cloud workloads. The architectural differences from a conventional hyperscaler facility are concrete: 80+ kW per rack versus 10 to 20 kW; direct-to-chip liquid cooling as standard; all-to-all NVLinkcomputeNVIDIA's proprietary GPU-to-GPU interconnect, providing bandwidth an order of magnitude above PCIe and the basis for tightly-coupled 8-GPU server nodes (DGX, HGX). Open full entry and InfiniBandcomputeA high-throughput, low-latency network fabric (Mellanox, now NVIDIA) used for inter-node communication in AI training clusters, supporting RDMA for direct GPU-to-GPU transfer across machines. Open full entry fabrics; single-tenant operation; power draw measured in gigawatts.

The term entered mainstream usage with Jensen Huang’s 2024 framing of NVIDIA’s H100 and Blackwell deployments as AI factories rather than data centers. The framing emphasizes that the output is tokens (or trained models), not virtualized compute hours, and the economics follow factory logic: continuous capacity utilization, large fixed costs amortized over throughputcomputeThe rate at which a model produces output tokens, usually quoted as tokens-per-second per GPU or aggregate, the headline number for serving-cost economics. Open full entry .

For sovereign / open-source AI, the AI-factory model is the centralizing pressure. The capital threshold to build one is high enough that the natural number of operators is small, which is part of why decentralized-training research (Pluralis, Templar, Prime Intellect) matters as a counterweight.

Sources

Jensen Huang on AI factories (NVIDIA GTC 2024 keynote)

Mentioned in

direct-to-chip cooling

Back to glossary