Glossary
TEE
A hardware-isolated CPU region where code and data are protected from inspection by the host OS, used to run inference in a way the operator cannot read or modify.
A CPU mode where a guest workload runs in encrypted memory, with the hypervisor and host OS unable to read the contents. The four real- world implementations are Intel TDX, AMD SEV-SNP, ARM CCA, and NVIDIA H100/H200 Confidential Compute. Each provides hardware-rooted attestationidentity-trustA cryptographic protocol that lets a remote party verify which code is running inside a TEE, including which model is loaded and which build of the inference engine. Open full entry that the running code matches an expected measurement.
For AI the relevant property is “the operator cannot see your prompt or read the model weights.” A team running inferenceruntimeRunning a trained model to produce outputs (tokens, images, embeddings) from inputs at serving time, as distinct from the gradient updates of training. Open full entry inside a TEE on a third-party cloud can give cryptographic evidence to their users that the operator did not collect prompts. This is the technical foundation for confidential AI products like Apple Private Cloud Compute, Anthropic’s Confidential inferenceruntimeRunning a trained model to produce outputs (tokens, images, embeddings) from inputs at serving time, as distinct from the gradient updates of training. Open full entry , and several smaller open- source efforts (Maple AI, NEAR Confidential AI).
The trust model is real but not absolute. TEEs trust the hardware vendor; side-channel attacks against TDX and SEV-SNP have been demonstrated; performance overhead is meaningful for GPUsiliconA massively parallel processor originally designed for graphics, repurposed since the 2010s as the dominant compute substrate for both training and inference of large neural networks. Open full entry workloads. Still, TEE-based serving is the only practical “operator-blind” inference architecture at production scale.