Feature Engineering
Garbage in, garbage out
Feature engineering is reshaping your raw data into inputs that expose the pattern a model is trying to learn.
A famous truism: "applied machine learning is basically feature engineering." The same algorithm can flop or shine depending entirely on what you feed it. Often a well-crafted feature beats a fancier model on raw columns.
Use what you know about the problem to build columns that make the right answer easy to read off.
See a feature unlock the pattern
Two classes that no straight line can separate in the raw (x, y) space. Engineer one new feature — the distance from the centre — and suddenly the classes pull apart cleanly.
Common moves
A ratio often carries the real signal — price per square foot beats price and size separately.
Multiply features when their combination matters, not each alone.
Explode a timestamp into the pieces a model can use.
Group a continuous value into ranges when the relationship is steppy, not smooth.
Summarise related rows — a user's average order, total visits, last-seen gap.
Tame skewed values so big outliers don't dominate.
Related preprocessing steps
Feature engineering overlaps with a few cousin steps, each with its own article:
Turn text categories into numeric form — see Encoding Categorical Variables.
Do's and don'ts
- Lean on domain knowledge — what actually drives the outcome?
- Fit transforms on training data only, then apply to test
- Check that a new feature truly helps on validation
- Leak the target into a feature (data leakage)
- Use future information unavailable at prediction time
- Add hundreds of features blindly — invites overfitting
Neural networks learn features automatically from raw signals (pixels, audio, text), which is why manual feature engineering matters less there — and enormously for classic tabular ML.