Feature Scaling
When units hijack the model
Suppose you predict from age (0–80) and salary (0–100,000). To a distance-based model, a €10,000 gap dwarfs a 40-year gap — salary drowns out age purely because of its scale.
Feature scaling fixes this by rescaling every feature to a comparable range, so each one contributes on its merits, not its units.
Two features, two scalings
The same points plotted three ways: raw (salary axis dominates), min-max normalized to [0, 1], and standardized to mean 0 / std 1.
The two methods
Squeezes values into [0, 1]. Preserves the shape, but sensitive to outliers (min/max are extreme points).
Centres at 0 with unit variance. No fixed range; far more robust to outliers. The common default.
Which to use
- You need a bounded range (e.g. image pixels, neural net inputs)
- The data has few outliers
- The distribution isn't assumed to be Gaussian
- There are outliers (min-max would crush everything else)
- The model assumes roughly Gaussian features
- You want the safe, general-purpose default
Fit the scaler (its min/max or μ/σ) on the training set only, then transform train, validation, and test with those same numbers. Re-fitting on test leaks information — see Train–Test Split.