Hyperparameter Tuning · Suman Bhadra Notes

Parameters vs hyperparameters

A model learns its parameters (weights) from data. Its hyperparameters — the dials you set before training — control how that learning happens, and you have to choose them yourself.

Parameters learned

Regression coefficients, network weights — fit by the training process.

Hyperparameters chosen

Tree depth, K in KNN, learning rate, regularisation strength — set by you.

The goal

Find the hyperparameter combination that gives the best cross-validated score — without ever touching the final test set.

Grid search vs random search

The score landscape below has a sweet spot (bright zone). Watch grid search probe a regular lattice, then random search scatter points — and notice why random often finds a better spot on the same budget.

The search strategies

Grid search every combo

Try all combinations of a fixed set of values. Thorough but explodes combinatorially — 5 values × 4 dials = 625 fits.

Random search sample N

Sample random combinations. With the same budget it explores more distinct values per dial — usually finds a better spot faster.

Bayesian / Optuna smart search

Use past results to decide where to look next. Most efficient when each training run is expensive.

Why random often wins

Usually only one or two hyperparameters really matter. Grid search wastes evaluations on fine variations of the unimportant dial; random search spends those same evaluations sampling more values of the important one.

Doing it right

Do

Tune against a validation set / CV, not the test set
Search learning rate & regularisation on a log scale
Start coarse, then zoom into the best region
Use early stopping to cut wasted runs

Don't

Report the tuning score as final performance
Grid-search a huge space blindly
Forget to fix the random seed for reproducibility