Hyperparameter Tuning

ML model selection grid search random search

Parameters vs hyperparameters

A model learns its parameters (weights) from data. Its hyperparameters — the dials you set before training — control how that learning happens, and you have to choose them yourself.

Parameters learned

Regression coefficients, network weights — fit by the training process.

Hyperparameters chosen

Tree depth, K in KNN, learning rate, regularisation strength — set by you.

The goal

Find the hyperparameter combination that gives the best cross-validated score — without ever touching the final test set.

Grid search vs random search

The score landscape below has a sweet spot (bright zone). Watch grid search probe a regular lattice, then random search scatter points — and notice why random often finds a better spot on the same budget.

The search strategies

Grid search every combo

Try all combinations of a fixed set of values. Thorough but explodes combinatorially — 5 values × 4 dials = 625 fits.

Random search sample N

Sample random combinations. With the same budget it explores more distinct values per dial — usually finds a better spot faster.

Bayesian / Optuna smart search

Use past results to decide where to look next. Most efficient when each training run is expensive.

Why random often wins

Usually only one or two hyperparameters really matter. Grid search wastes evaluations on fine variations of the unimportant dial; random search spends those same evaluations sampling more values of the important one.

Doing it right

Do
  • Tune against a validation set / CV, not the test set
  • Search learning rate & regularisation on a log scale
  • Start coarse, then zoom into the best region
  • Use early stopping to cut wasted runs
Don't
  • Report the tuning score as final performance
  • Grid-search a huge space blindly
  • Forget to fix the random seed for reproducibility