Sampling & Temperature
A model predicts a distribution, not a word
After reading "The cat sat on the …", an LLM doesn't output one word. It produces a raw score (a logit) for every token in its vocabulary, and softmax turns those scores into a probability for each candidate. Decoding is the rule we use to turn that probability bar chart into an actual choice.
Always take the single highest-probability token. Deterministic, but repetitive and bland.
Draw a token at random, weighted by its probability. More varied — and the knobs below control how varied.
Watch the distribution get reshaped
Same logits every time — only the decoding knob changes. Temperature scales the spread; top-k and top-p trim the long tail of unlikely tokens before we sample.
What each knob does
T < 1 sharpens — the model gets confident and almost greedy. T > 1 flattens — rarer tokens get a real chance. T → 0 is greedy.
Throw away everything except the k most likely tokens, then renormalize and sample from those.
Keep the smallest set of top tokens whose probabilities sum to p (e.g. 0.9). Adapts to how peaked the distribution is.
Need a factual, repeatable answer? Use low temperature (or greedy). Want creative, diverse writing? Raise the temperature and use top-p ≈ 0.9. Top-k and top-p are usually combined with temperature, not instead of it.
Now hold the dice yourself. Pick a knob, shape the distribution, then hit Sample ×10 to actually draw ten next-tokens from it. Green bars are the surviving candidates with their renormalized probabilities; faded bars can never be chosen.
Sample at T = 0.1 — ten draws, almost always ten "mat"s (that's why low temperature repeats itself). Now T = 2.5 — even "box" shows up. Then try top-k = 1: that's greedy decoding, no matter how often you roll.
Why trim the tail at all?
- Every token has some chance — including nonsense
- One unlucky draw can derail the whole sentence
- Great for chaos, bad for coherence
- The absurd long tail is cut before drawing
- Still random among plausible tokens
- The sweet spot for most generation