Glossary

RAG

A pattern where a model retrieves relevant documents from an external store at query time and conditions its answer on them, instead of relying only on parametric knowledge.

Retrieval and Memory also: Agents aka retrieval-augmented generation

The architectural pattern that bridges fixed-weight models and changing data. A user query is embedded, the embeddingretrieval-memoryA fixed-size vector representation of a piece of text learned so semantically similar texts land near each other in the vector space, the basis for vector search and most RAG. Open full entry is matched against a vector databaseretrieval-memoryA datastore optimized for approximate nearest-neighbor search over high-dimensional embedding vectors, the storage substrate for most RAG and recommendation pipelines. Open full entry of pre-embedded document chunks, the top-k chunks are inserted into the prompt as context, and the model generates conditioned on them. Citations naturally fall out as references to the retrieved chunks.

RAG matters because it sidesteps the freshness problem (parametric knowledge is frozen at the training cutoff) and the source-attribution problem (a RAG answer can point at a specific paragraph). It is the backbone of nearly every production chatbot on proprietary data.

Complications stack. chunkingretrieval-memorySplitting source documents into smaller passages for embedding and retrieval, where the chunk size and overlap directly affect retrieval quality and context efficiency. Open full entry strategy matters: too small loses context, too large blunts retrieval. rerankingretrieval-memoryA second-pass scoring step that takes the top-k candidates from initial retrieval and rescores them with a more expensive but more accurate cross-encoder model. Open full entry is usually needed because vector similarity alone is noisy. Long-context models do not obsolete RAG: the “lost in the middle” effect means even a 1M-token window does not match precise retrieval into a smaller working context.

Sources

Mentioned in

Back to glossary