PCA — Principal Component Analysis
Fewer numbers, same story
Datasets often have many features that move together. PCA finds new axes along which the data varies the most, then keeps just the top few — compressing the data while losing as little information as possible.
The key insight: variance is information. A direction where points spread out a lot tells you a lot; a direction where they barely move tells you little. PCA ranks directions by how much variance they capture and lets you drop the boring ones.
Watch 2-D collapse to 1-D
An elongated cloud of points. PCA centres it, finds the long axis (the first principal component), and projects every point onto it — turning two numbers into one with hardly any loss.
How it works
Shift the cloud so its centre sits at the origin.
The principal components are the axes of maximum variance, each orthogonal to the last.
Each component captures a share of the total variance, in descending order.
Project onto the top components; discard the rest. That's the compression.
The animation showed you the answer — now search for it yourself. Rotate a candidate axis through the same cloud and watch how much variance the projections capture. Only one angle maximizes the bar. That direction is PC1; "eigenvector of the covariance matrix" is just the closed-form way of finding it.
Wiggle the slider slowly near the peak — the bar is flat-topped, because variance changes slowly near the optimum (that's why PCA is stable). Rotate 90° away from PC1 and you're looking at PC2: the worst possible single axis.
Choosing how many components
Plot cumulative variance vs number of components and keep enough to reach, say, 95%. Often a handful of components capture almost everything in dozens of correlated features.
- Visualizing high-dimensional data in 2-D
- Speeding up models by shrinking inputs
- Removing correlated/redundant features and noise
- Components are hard to interpret (mixes of features)
- Assumes linear structure — fails on curved manifolds
- Scale first — PCA is sensitive to feature units
PCA is the classic unsupervised dimensionality-reduction method; for non-linear structure, t-SNE and UMAP are popular alternatives. Always scale features first.