Sampling Parameters

Sampling parameters are inference-time settings that control the randomness and shape of the token probability distribution during LLM generation, without changing the model's weights.

Details

Common parameters include temperature (scales the logit distribution - lower values make the model more deterministic, higher values increase randomness), top-p / nucleus sampling (restricts sampling to the smallest set of tokens whose cumulative probability exceeds a threshold), top-k (restricts sampling to the k most probable tokens), and repetition penalties (frequency and presence penalties that discourage the model from repeating tokens it has already generated).

These parameters interact: for example, temperature reshapes the distribution before top-p or top-k filtering is applied. Logprobs expose the underlying distribution that these parameters act upon, capturing the model's raw token probabilities before sampling occurs. Sampling parameters are distinct from constrained decoding (see structured output), which also modifies the token distribution but enforces structural validity (e.g., valid JSON) rather than shaping diversity or quality.

Sampling parameter choices affect output quality, diversity, and reliability. Lower temperature and tighter top-p values produce more focused, reproducible outputs suited to tasks like structured output generation and tool calling. Higher values encourage diversity and creativity but increase the risk of hallucination and incoherent outputs. Inference providers expose these parameters through their APIs, and tuning them is a routine part of prompt development and eval configuration.

Examples

  • Setting temperature to 0 for near-deterministic tool call generation in an agent system (GPU floating-point non-determinism means identical prompts may occasionally produce slightly different outputs even at temperature 0).
  • Using top-p of 0.9 with moderate temperature for creative writing tasks.
  • Lowering temperature during evals to reduce variance across runs and produce more comparable results.

Synonyms

decoding parameters, generation parameters, temperature settings