Bias–Variance Tradeoff
Splitting up the error
A model's expected error on new data breaks into three pieces: bias², variance, and irreducible noise.
Error from overly simple assumptions. A high-bias model misses the true pattern no matter the data.
Error from sensitivity to the particular training sample. A high-variance model swings wildly if you reshuffle the data.
Randomness in the world itself. No model can do anything about it.
The dartboard analogy
Picture each trained model as a handful of darts thrown at a target. The bullseye is the truth. Bias is how far the cluster sits from the centre; variance is how spread out the darts are.
Why it's a tradeoff
As you make a model more complex, it fits the training data more closely: bias falls, but variance rises. As you make it simpler, variance falls but bias rises. Total error is a U-shape — the goal is the bottom of the U.
High bias = underfitting. High variance = overfitting. The tradeoff is the formal version of that same idea.
Here is the U-curve made real. Ten training points (dots) and eight held-out test points (diamonds) were drawn from the same noisy wave. The slider fits an actual polynomial of your chosen degree — watch the train error fall forever while the test error turns back up.
Degree 1 can't bend — it misses the wave on both sets (bias). Degree 3–4 tracks the wave and the two errors agree (the sweet spot). Degree 8–9 threads every training dot, wiggling violently between them — train error ≈ 0 while test error explodes (variance).
Turning the knobs
- More complex model / more features
- Less regularization
- Boosting (builds up complexity)
- More training data
- More regularization
- Bagging / averaging (e.g. Random Forest)
More data lowers variance without raising bias — which is why "get more data" is the most reliable fix of all.