Prompt Compaction

Prompt compaction is the process of reducing the token count of a prompt while preserving the information the LLM needs to perform its task, typically to stay within context size limits or to reduce latency and inference cost.

Details

As conversations, tool outputs, and retrieved documents accumulate, the assembled context can approach or exceed a model's context window. Prompt compaction addresses this by condensing or removing content before it is sent to the model. Common techniques include summarizing earlier conversation turns into a shorter state block, dropping or truncating old messages, compressing RAG passages to their key claims, and replacing verbose tool outputs with structured summaries. These techniques are a core part of context engineering.

Compaction introduces a fidelity-efficiency tradeoff: aggressive compression reduces token usage and latency but risks losing details the model needs, which can degrade answer quality or cause the model to hallucinate missing facts. The choice of what to compact and how aggressively depends on the task - a multi-turn coding agent may need precise earlier outputs while a chatbot can tolerate coarser summaries.

Architectural alternatives to compaction include context isolation (giving each subagent a fresh, scoped context) and offloading information to agent memory for retrieval on demand rather than carrying it in every prompt. Prompt caching is a complementary optimization: it reduces the compute cost of repeated prefixes but does not reduce token count.

Examples

Summarizing a long chat history into a short "state of the conversation" block prepended to the latest user message.
Truncating tool call results (e.g., limiting file contents to the first N lines) before injecting them into context.
Using an LLM call to distill a set of retrieved documents into a condensed passage before the main generation step.
Replacing a full reasoning step with only its final conclusion.

Synonyms

context compaction, prompt compression, context compression, conversation summarization