Glossary
batching
Grouping multiple requests or training examples into a single forward or backward pass, the lever that turns GPU compute density into throughput.
The practice of running many examples through a model in parallel. GPUs are compute-dense and memory-bandwidth-limited; running one sequence at a time leaves most of the compute idle. Batching amortizes the memory traffic across more useful work.
For training the batch size is mostly a quality knob (small batches generalize differently, large batches require learning-rate tuning). For inferenceruntimeRunning a trained model to produce outputs (tokens, images, embeddings) from inputs at serving time, as distinct from the gradient updates of training. Open full entry , batching is the central knob: larger batches mean higher throughputcomputeThe rate at which a model produces output tokens, usually quoted as tokens-per-second per GPU or aggregate, the headline number for serving-cost economics. Open full entry per GPUsiliconA massively parallel processor originally designed for graphics, repurposed since the 2010s as the dominant compute substrate for both training and inference of large neural networks. Open full entry but higher per-request latencycomputeThe time from request submission to response completion, broken down for LLMs into time-to-first-token and time-per-output-token, the user-facing speed metric. Open full entry .
Continuous batching is the inferenceruntimeRunning a trained model to produce outputs (tokens, images, embeddings) from inputs at serving time, as distinct from the gradient updates of training.
Open full entry -time pattern that makes
batching dynamic per token instead of per request. Combined with
PagedAttention for cache management, modern runtimes maintain
batches of hundreds of concurrent requests per GPUsiliconA massively parallel processor originally designed for graphics, repurposed since the 2010s as the dominant compute substrate for both training and inference of large neural networks.
Open full entry on production
workloads.