PCA — Principal Component Analysis

ML unsupervised dimensionality reduction variance

Fewer numbers, same story

Datasets often have many features that move together. PCA finds new axes along which the data varies the most, then keeps just the top few — compressing the data while losing as little information as possible.

The key insight: variance is information. A direction where points spread out a lot tells you a lot; a direction where they barely move tells you little. PCA ranks directions by how much variance they capture and lets you drop the boring ones.

Watch 2-D collapse to 1-D

An elongated cloud of points. PCA centres it, finds the long axis (the first principal component), and projects every point onto it — turning two numbers into one with hardly any loss.

How it works

1. Center subtract the mean

Shift the cloud so its centre sits at the origin.

2. Find directions eigenvectors of covariance

The principal components are the axes of maximum variance, each orthogonal to the last.

3. Rank by variance eigenvalues

Each component captures a share of the total variance, in descending order.

4. Keep top k project & drop

Project onto the top components; discard the rest. That's the compression.

The animation showed you the answer — now search for it yourself. Rotate a candidate axis through the same cloud and watch how much variance the projections capture. Only one angle maximizes the bar. That direction is PC1; "eigenvector of the covariance matrix" is just the closed-form way of finding it.

Wiggle the slider slowly near the peak — the bar is flat-topped, because variance changes slowly near the optimum (that's why PCA is stable). Rotate 90° away from PC1 and you're looking at PC2: the worst possible single axis.

Choosing how many components

Explained variance

Plot cumulative variance vs number of components and keep enough to reach, say, 95%. Often a handful of components capture almost everything in dozens of correlated features.

Great for
  • Visualizing high-dimensional data in 2-D
  • Speeding up models by shrinking inputs
  • Removing correlated/redundant features and noise
Watch out
  • Components are hard to interpret (mixes of features)
  • Assumes linear structure — fails on curved manifolds
  • Scale first — PCA is sensitive to feature units
Related

PCA is the classic unsupervised dimensionality-reduction method; for non-linear structure, t-SNE and UMAP are popular alternatives. Always scale features first.