Cross-Validation

ML evaluation k-fold validation

One split is a lucky (or unlucky) draw

A single train–test split gives you one score — and it depends on which rows happened to land in the test set.

Get an easy test slice and the score looks great; get a hard one and it looks poor. With small datasets that luck-of-the-draw can swing the number wildly. Cross-validation removes the luck by rotating which slice is held out and averaging the results.

K-fold cross-validation

Split the data into K equal parts (folds). Then run K rounds: each round, one fold is the validation set and the other K−1 train the model. Every fold gets a turn. Average the K scores.

Why it's better

Every row tested once

No data is wasted — each example serves as validation exactly one time.

Mean ± spread K scores

You get an average and a sense of how much the score varies across folds.

Stable estimate less luck

Averaging cancels out the good-luck/bad-luck of any single split.

Common variants

Stratified K-fold keep class balance

Each fold mirrors the overall class proportions — essential for imbalanced classification.

Leave-One-Out K = N

Each fold is a single row. Thorough but expensive — used only for tiny datasets.

Time-series split past → future

Folds respect time order: always train on earlier data, validate on later.

Picking K

K = 5 or K = 10 are the standard choices — a good balance of reliable estimate and reasonable compute.

Where it fits

Cross-validation is the workhorse for comparing models and tuning hyperparameters — see Hyperparameter Tuning. Keep a final untouched test set aside for the last, honest number; use CV on the rest for all your decisions.

Watch for leakage

Any preprocessing that learns from data (scaling, imputation, feature selection) must happen inside each fold — fit on that fold's training part only. Otherwise the validation fold leaks in. Full story: data leakage & pipelines.