Cross-Validation
One split is a lucky (or unlucky) draw
A single train–test split gives you one score — and it depends on which rows happened to land in the test set.
Get an easy test slice and the score looks great; get a hard one and it looks poor. With small datasets that luck-of-the-draw can swing the number wildly. Cross-validation removes the luck by rotating which slice is held out and averaging the results.
K-fold cross-validation
Split the data into K equal parts (folds). Then run K rounds: each round, one fold is the validation set and the other K−1 train the model. Every fold gets a turn. Average the K scores.
Why it's better
No data is wasted — each example serves as validation exactly one time.
You get an average and a sense of how much the score varies across folds.
Averaging cancels out the good-luck/bad-luck of any single split.
Common variants
Each fold mirrors the overall class proportions — essential for imbalanced classification.
Each fold is a single row. Thorough but expensive — used only for tiny datasets.
Folds respect time order: always train on earlier data, validate on later.
K = 5 or K = 10 are the standard choices — a good balance of reliable estimate and reasonable compute.
Where it fits
Cross-validation is the workhorse for comparing models and tuning hyperparameters — see Hyperparameter Tuning. Keep a final untouched test set aside for the last, honest number; use CV on the rest for all your decisions.
Any preprocessing that learns from data (scaling, imputation, feature selection) must happen inside each fold — fit on that fold's training part only. Otherwise the validation fold leaks in. Full story: data leakage & pipelines.