RAG
RAG (retrieval-augmented generation) is a pattern that enhances LLM outputs by retrieving relevant documents or passages from an external source and including them in the model's context before generation.
Details
Rather than relying solely on knowledge encoded in the model's weights during training, RAG systems chunk source documents, convert them to embeddings with an embedding model, store them in a retrieval backend (vector database, search index, database, or API), and at inference time query that backend and inject the results into the prompt.
This supplies the model with up-to-date or domain-specific information without fine-tuning, overcoming the model's knowledge cutoff. RAG is one of the most common approaches to grounding - anchoring outputs in verifiable sources rather than relying solely on parametric knowledge.
A standard RAG pipeline is a common example of an AI workflow - a fixed sequence of retrieve, optionally rerank, then generate. Retrieval quality - candidate generation, ranking, and result formatting - directly determines output quality and is a core lever in context engineering. In agentic RAG, retrieval becomes part of an agent tool loop where the model dynamically decides what to retrieve and whether to refine or re-retrieve based on intermediate results.
RAG reduces hallucination when retrieved content is relevant and consistent, but does not eliminate it - the model can still misinterpret, ignore, or fabricate beyond retrieved content. RAG can also increase hallucination when retrieved context is irrelevant, contradictory, or low-quality: the model may confabulate more aggressively when attempting to reconcile conflicting signals than it would without retrieval at all. The indexed corpus is a trust boundary: anyone who can write to it can influence model outputs, because retrieved documents enter the prompt as context. This makes the corpus an attack surface for context poisoning, and indirect prompt injection through stored documents is the most common attack vector against RAG systems.
Examples
- A support chatbot that retrieves relevant help articles before answering a user question.
- A coding agent that searches a codebase for relevant files and includes them in context before generating a change.
- A legal research tool that retrieves statute text and case excerpts to ground its analysis.
Synonyms
retrieval-augmented generation, retrieval augmented generation