Glossary
embedding
A fixed-size vector representation of a piece of text learned so semantically similar texts land near each other in the vector space, the basis for vector search and most RAG.
A vector of typically 384 to 4096 floats produced by an embedding model from a chunk of text. The model is trained so that texts with similar meaning produce vectors with high cosine similarity. The embedding becomes a queryable handle on the text: index a million chunks as vectors, embed a query, retrieve the nearest neighbors.
The model that produces embeddings is separate from the generative model that consumes the retrieved text. Open embedding models (BGE, GTE, Nomic, mxbai) compete on the MTEB benchmarkevaluationA standardized dataset and scoring rubric used to compare model capability on a defined task, the unit of model evaluation since GLUE made the format the default. Open full entry , which evaluates retrieval, classification, clustering, and rerankingretrieval-memoryA second-pass scoring step that takes the top-k candidates from initial retrieval and rescores them with a more expensive but more accurate cross-encoder model. Open full entry quality across languages and domains.
Dimensionality is a knob. Smaller vectors mean cheaper storage and faster nearest-neighbor search at some quality cost. The Matryoshka training approach (2024) produces embeddings whose first N dimensions are usable on their own, letting a single model serve multiple size-quality tradeoffs.