Glossary
transformer
The neural network architecture that combines self-attention with feed-forward layers, dominant for language modeling since 2017 and the substrate for nearly every modern LLM.
A stacked sequence of identical blocks. Each block has two sublayers: a
self-attention mechanism that lets every token see every prior token,
and a position-wise feed-forward network (FFN) that processes each
position independently. Residual connections and layer norms surround
both sublayers; positional encodings inject sequence order into the
otherwise position-blind attentionruntimeThe transformer operation where each token computes a weighted average over all earlier tokens, with weights derived from learned similarity between query and key vectors.
Open full entry .
The original 2017 paper introduced the architecture for machine translation. Decoder-only transformers became dominant for language modeling by 2020 with GPT-3 and remain the standard shape: hundreds of billions of parameters across dozens or hundreds of stacked blocks, trained autoregressively on web-scale text.
Variants reshape one component without changing the bones. MoE models
replace the FFN with a sparse mixture of expertsweightsA model architecture where each token activates only a fraction of total parameters by routing through learned expert subnetworks, decoupling capacity from compute.
Open full entry . Mamba and similar
state-space architectures swap attentionruntimeThe transformer operation where each token computes a weighted average over all earlier tokens, with weights derived from learned similarity between query and key vectors.
Open full entry for a recurrent operator with
linear-time complexity. Diffusion language models (Gemini Diffusion,
LLaDA-style) replace next-token prediction with iterative denoising.
None has displaced the transformer at the frontierweightsThe current capability envelope of AI, defined by the most capable models in deployment at any given time; an evolving label rather than a fixed threshold.
Open full entry as of 2026.