Retrieval
Retrieval is the process of finding and fetching relevant information from external data sources so it can be included in an LLM's context at inference time, supplying knowledge that is absent from or unreliable in the model's weights.
Details
A retrieval pipeline typically has three stages: candidate generation, ranking, and result formatting. Candidate generation casts a wide net using fast, approximate methods - embedding-based nearest-neighbor search from a vector database, keyword search (BM25), or hybrid search that combines both. Ranking narrows the candidates: a reranking step rescores them with a higher-quality model so that only the most relevant items proceed. Result formatting prepares the selected items for inclusion in the prompt - truncating, reordering, or adding metadata - to fit within the model's context size.
Retrieval quality directly determines output quality in any system that depends on external information: missing relevant documents or surfacing irrelevant ones degrades generation regardless of model capability. The most common application is RAG, where retrieved passages are injected into context to augment generation. Retrieval also serves distinct roles in grounding (anchoring outputs in verifiable sources), agent memory (recalling past interactions by semantic similarity), and knowledge-cutoff mitigation (supplying information the model lacks from training). In agentic RAG, retrieval becomes iterative: an agent issues multiple retrieval calls, reformulates queries, and decides when results are sufficient.
Retrieval quality is a key lever in context engineering. The data sources feeding retrieval are also an attack surface: context poisoning can manipulate model outputs by injecting or modifying documents in the retrieval backend.
Examples
- A RAG pipeline that embeds a user query, retrieves the top 50 chunks from a vector database, reranks them, and passes the top 5 into the prompt.
- A coding agent that searches a codebase by file name and content to gather relevant source files before generating a change.
- A web search tool that queries an external search engine and returns snippets for inclusion in context.
- An agent memory system that retrieves past conversation summaries by semantic similarity to the current query.
Synonyms
information retrieval, knowledge retrieval