Instruction Following

Instruction following is an LLM behavior where the model treats parts of the input (often system/developer/user messages in a prompt) as instructions and generates outputs that aim to satisfy those constraints.

Details

Instruction following is typically strengthened during post-training via instruction fine-tuning and preference-based methods (for example RLHF), and it strongly influences how "chat" models behave across multi-turn conversations. It is not the same as task capability: a model can understand an instruction yet fail to complete it due to missing knowledge, weak reasoning, or missing tools.

Because instruction following is a trained behavior that the model applies broadly across its input, it can be exploited when untrusted content is treated as instruction - for example a malicious instruction embedded in retrieved content (see prompt injection).

Examples

"Answer in JSON with keys ...", "Be brief", "Use the provided context only", "Do not reveal secrets".

Synonyms

instruction adherence, instruction compliance