Hyperparameter Tuning
Parameters vs hyperparameters
A model learns its parameters (weights) from data. Its hyperparameters — the dials you set before training — control how that learning happens, and you have to choose them yourself.
Regression coefficients, network weights — fit by the training process.
Tree depth, K in KNN, learning rate, regularisation strength — set by you.
Find the hyperparameter combination that gives the best cross-validated score — without ever touching the final test set.
Grid search vs random search
The score landscape below has a sweet spot (bright zone). Watch grid search probe a regular lattice, then random search scatter points — and notice why random often finds a better spot on the same budget.
The search strategies
Try all combinations of a fixed set of values. Thorough but explodes combinatorially — 5 values × 4 dials = 625 fits.
Sample random combinations. With the same budget it explores more distinct values per dial — usually finds a better spot faster.
Use past results to decide where to look next. Most efficient when each training run is expensive.
Usually only one or two hyperparameters really matter. Grid search wastes evaluations on fine variations of the unimportant dial; random search spends those same evaluations sampling more values of the important one.
Doing it right
- Tune against a validation set / CV, not the test set
- Search learning rate & regularisation on a log scale
- Start coarse, then zoom into the best region
- Use early stopping to cut wasted runs
- Report the tuning score as final performance
- Grid-search a huge space blindly
- Forget to fix the random seed for reproducibility