Glossary

RoPE

A positional encoding that rotates query and key vectors in two-dimensional subspaces by an angle proportional to their position, making attention scores depend on relative not absolute position.

Runtime also: Training also: Weights aka rotary positional embedding, rotary position embedding

A way to inject sequence position into attentionruntimeThe transformer operation where each token computes a weighted average over all earlier tokens, with weights derived from learned similarity between query and key vectors. Open full entry . RoPE splits each query and key vector into 2D pairs, rotates each pair by an angle that depends on the token’s position and a per-pair frequency, and lets the dot product fall out as a function of relative position only. The math is clean and the implementation is a small modification to the attentionruntimeThe transformer operation where each token computes a weighted average over all earlier tokens, with weights derived from learned similarity between query and key vectors. Open full entry kernel.

RoPE is the positional encoding used in LlamaweightsMeta's open-weight model family, the most widely deployed open release through 2024 to 2026, released under the source-available Community License with an MAU cap and acceptable-use clause. Open full entry (all generations), QwenweightsAlibaba's open-weight model family, leading the multilingual and Chinese-language open-weight space, released under Apache 2.0 with sizes from 0.6B to 235B parameters. Open full entry , MistralweightsA French open-weight model family from Mistral AI, released mostly under Apache 2.0 with strong performance per parameter and notable MoE variants (Mixtral, Mixtral 8x22B). Open full entry , DeepSeekweightsA Chinese open-weight family known for the V3 MoE base model and the R1 reasoning model, both released under permissive licenses and unusually transparent in their training-cost reporting. Open full entry , GemmaweightsGoogle's open-weight model family derived from Gemini research, with source-available licensing that includes an acceptable-use clause and license-revocation hook. Open full entry , and most other modern open weightsweightsA model release that publishes the trained parameters under some downloadable license, distinct from "open source" which (per OSAID) also requires data and training-code openness. Open full entry families. The frequency base parameter (often called theta or rope_theta) controls the rotation rate; raising it during fine-tuningtrainingContinued training of a pretrained base model on a smaller, task-specific dataset to specialize its behavior without retraining from scratch. Open full entry extends the usable context windowruntimeThe maximum number of tokens a model can attend to in a single forward pass, set during pretraining and extended (sometimes) via fine-tuning or training-free extrapolation tricks. Open full entry without retraining from scratch (YaRN, NTK-aware, and dynamic scaling are variants of this trick).

It replaced absolute and learned positional embeddings as the default during 2022 to 2023 and now sits in nearly every production implementation of attention.

Sources

Mentioned in

Back to glossary