Bias–Variance Tradeoff · Suman Bhadra Notes

Splitting up the error

A model's expected error on new data breaks into three pieces: bias², variance, and irreducible noise.

Bias wrong on average

Error from overly simple assumptions. A high-bias model misses the true pattern no matter the data.

Variance unstable

Error from sensitivity to the particular training sample. A high-variance model swings wildly if you reshuffle the data.

Irreducible noise unavoidable

Randomness in the world itself. No model can do anything about it.

The dartboard analogy

Picture each trained model as a handful of darts thrown at a target. The bullseye is the truth. Bias is how far the cluster sits from the centre; variance is how spread out the darts are.

Why it's a tradeoff

As you make a model more complex, it fits the training data more closely: bias falls, but variance rises. As you make it simpler, variance falls but bias rises. Total error is a U-shape — the goal is the bottom of the U.

The connection

High bias = underfitting. High variance = overfitting. The tradeoff is the formal version of that same idea.

Here is the U-curve made real. Ten training points (dots) and eight held-out test points (diamonds) were drawn from the same noisy wave. The slider fits an actual polynomial of your chosen degree — watch the train error fall forever while the test error turns back up.

polynomial degree

Degree 1 can't bend — it misses the wave on both sets (bias). Degree 3–4 tracks the wave and the two errors agree (the sweet spot). Degree 9 threads every training dot, wiggling violently between them — train error ≈ 0 while test error explodes (variance).

Turning the knobs

Lower bias (if underfitting)

More complex model / more features
Less regularization
Boosting (builds up complexity)

Lower variance (if overfitting)

More training data
More regularization
Bagging / averaging (e.g. Random Forest)

The escape hatch

More data lowers variance without raising bias — which is why "get more data" is the most reliable fix of all.