Hallucination

A hallucination is output generated by an LLM that is fluent and plausible-sounding but factually incorrect or fabricated - not supported by the provided context or other verified sources.

Details

Hallucinations arise because language models produce text based on statistical patterns rather than verified knowledge, allowing them to generate confident statements with no basis in fact. Common contributing factors include ambiguous prompts, insufficient context, queries outside the model's training distribution, and questions beyond the model's knowledge cutoff. Grounding - anchoring model outputs in verifiable sources - is a primary mitigation.

In agent systems, hallucinations are particularly risky because they can propagate through multi-step workflows. A hallucinated fact can feed into tool calls (e.g., querying a database with a fabricated identifier), and the resulting errors can reinforce the original falsehood. In multi-agent systems, one agent's hallucinated output may be accepted as fact by downstream agents, compounding the error. Attackers can deliberately trigger hallucinations as an attack vector (see hallucination exploitation).

Examples

  • A model invents a plausible-looking but nonexistent citation (author, title, journal).
  • An agent hallucinates a package name, installs it via a tool call, and a typosquatted malicious package gets executed (see supply chain attack).
  • One agent in a pipeline produces a fabricated statistic that downstream agents incorporate into reports without verification.