Knowledge Cutoff

A knowledge cutoff is the end date of the training data window used to train an LLM. Events, publications, or changes after that date are absent from the model's parametric knowledge unless they are provided at inference time (for example via retrieval).

Details

The cutoff is determined by the data collection window used during pretraining. Continued pretraining or fine-tuning on newer data can shift the boundary for specific domains but does not broadly extend the model's general knowledge. Queries that fall beyond the cutoff are a common source of hallucination, because the model may generate plausible-sounding but outdated or fabricated answers rather than acknowledging ignorance.

RAG, grounding, and web search tools are the primary mitigations: by retrieving current information and injecting it into the context at inference time, applications can supply the model with knowledge it lacks from training. Inference providers typically publish each model's cutoff date so that application developers can assess whether retrieval augmentation is needed for their use case.

Examples

  • A model with a January 2025 cutoff cannot answer questions about events that occurred in March 2025 from its weights alone.
  • A coding agent suggesting deprecated API patterns because its training data predates a library's breaking release.
  • A RAG-augmented assistant that retrieves current documentation to compensate for its model's outdated training data.

Synonyms

training cutoff, training data cutoff, knowledge cutoff date