The Open-Source AI Stack
RSS

Glossary

confidential computing

The umbrella category of compute architectures where workloads run isolated from the host operator, combining hardware TEEs, attestation, and encrypted-memory protections.

The architecture pattern that lets a workload run on infrastructure the operator does not trust. The components are: a hardware-isolated runtime (a TEE on CPU, NVIDIA’s confidential-compute mode on H100/ H200, or a similar GPUsiliconA massively parallel processor originally designed for graphics, repurposed since the 2010s as the dominant compute substrate for both training and inference of large neural networks. Open full entry isolation), attestationidentity-trustA cryptographic protocol that lets a remote party verify which code is running inside a TEE, including which model is loaded and which build of the inference engine. Open full entry that proves the expected code is running, and encrypted communication channels that keep secrets out of the operator’s reach end-to-end.

For AI, confidential computing matters because hosted inferenceruntimeRunning a trained model to produce outputs (tokens, images, embeddings) from inputs at serving time, as distinct from the gradient updates of training. Open full entry is otherwise a trust nightmare: prompts and outputs pass through provider infrastructure that could log, train on, or leak them. Apple Private Cloud Compute (custom Apple-silicon servers), AWS Nitro Enclaves, Anthropic Confidential inferenceruntimeRunning a trained model to produce outputs (tokens, images, embeddings) from inputs at serving time, as distinct from the gradient updates of training. Open full entry , and projects like Maple AI all build on confidential-compute primitives to give users cryptographic instead of contractual privacy.

The remaining gaps are GPUsiliconA massively parallel processor originally designed for graphics, repurposed since the 2010s as the dominant compute substrate for both training and inference of large neural networks. Open full entry coverage (only NVIDIA Hopper and Blackwell support confidential modes; AMD and Intel GPUs lag), performance overhead (10 to 30 percent on transformerruntimeThe neural network architecture that combines self-attention with feed-forward layers, dominant for language modeling since 2017 and the substrate for nearly every modern LLM. Open full entry workloads, depending on mode), and the attestationidentity-trustA cryptographic protocol that lets a remote party verify which code is running inside a TEE, including which model is loaded and which build of the inference engine. Open full entry chain (the user needs to trust the hardware vendor’s signing keys).

Sources

Mentioned in

Back to glossary