Post-training

Post-training is an umbrella term for all training phases applied to a base model after pretraining, with the goal of shaping the model's behavior for specific tasks, formats, safety constraints, or interaction styles.

Details

Common post-training techniques include supervised fine-tuning (for example instruction tuning), reinforcement learning-based methods (for example RLHF, DPO), and safety training. These stages typically use smaller, more curated datasets than pretraining.

Post-training is what turns a base model into a deployable instruction following model with appropriate alignment, and is a key part of what model developers do to prepare models for inference. Safety training in particular establishes model-level alignment such as refusal behaviors and content-policy adherence.

The boundary between post-training techniques is fluid: some taxonomies treat instruction tuning and preference-based alignment as distinct stages, while others group them together under fine-tuning or post-training depending on context.

Examples

Instruction tuning a base model on prompt-response pairs to improve instruction following
Applying RLHF (reinforcement learning from human feedback) to align a model with human preferences
DPO (direct preference optimization) as a simpler alternative to RLHF for preference-based alignment
Safety training to reduce harmful or policy-violating outputs
Fine-tuning on tool-call data to teach structured tool use