Glossary

spot instance

A discounted cloud instance that the provider can reclaim with little warning, used for fault-tolerant training and batch inference where interruption is cheaper than reservation cost.

Compute aka preemptible instance, spot GPU

Cloud capacity that the provider sells at a deep discount to its on-demand price (often 60 to 90 percent off) on the condition that it can reclaim the instance with minutes of notice. AWS calls it Spot; Google calls it Preemptible; Azure calls it Spot VM. All three operate on similar economics.

For AI workloads the use cases stratify. Production inferenceruntimeRunning a trained model to produce outputs (tokens, images, embeddings) from inputs at serving time, as distinct from the gradient updates of training. Open full entry : rarely runs on spot because user-facing latencycomputeThe time from request submission to response completion, broken down for LLMs into time-to-first-token and time-per-output-token, the user-facing speed metric. Open full entry hates eviction. Batch inferenceruntimeRunning a trained model to produce outputs (tokens, images, embeddings) from inputs at serving time, as distinct from the gradient updates of training. Open full entry : often runs on spot with retry-on-eviction handling. pretrainingtrainingThe first and most compute-expensive training phase, where a base model learns general capabilities by predicting the next token on trillions of words of web and book data. Open full entry research: runs on spot with frequent checkpointing and elastic scaling, since the discount swamps the operational complexity.

The pattern matters for open-source AI economics. Renting H100 capacity at on-demand prices is unaffordable for most independent researchers; renting it on spot at one-fifth the price puts non-frontierweightsThe current capability envelope of AI, defined by the most capable models in deployment at any given time; an evolving label rather than a fixed threshold. Open full entry -but-still- serious work within reach. Whether spot capacity will stay available in the GPUsiliconA massively parallel processor originally designed for graphics, repurposed since the 2010s as the dominant compute substrate for both training and inference of large neural networks. Open full entry -scarcity era is itself an open question.

Sources

AWS Spot Instances documentation

Back to glossary