Hierarchical Clustering
A tree of nested groups
Instead of fixing the number of clusters up front like K-Means, hierarchical clustering builds a whole tree of groupings — from every point alone, up to one big cluster containing everything.
The most common variant is agglomerative (bottom-up): start with each point as its own cluster, then repeatedly merge the two closest clusters until only one remains. The record of those merges is a tree called a dendrogram.
Watch the dendrogram build
Six points, each starting solo. Each step merges the closest pair and draws a new link at the height of that merge distance. Then a cut line shows how slicing the tree picks the number of clusters.
The dendrogram is the answer
The vertical level of each link is how far apart the two clusters were. Low merges = very similar.
Draw a horizontal line; the number of branches it crosses is your number of clusters. No need to pick K in advance.
You see sub-clusters inside clusters — a full hierarchy, not a flat partition.
Linkage — how to measure cluster distance
Distance = nearest two points. Can chain into long straggly clusters.
Distance = farthest two points. Makes compact, round clusters.
Average distance, or merge that least increases variance (Ward). Popular defaults.
Hierarchical vs K-Means
- You don't want to pre-commit to K
- You want a nested hierarchy to explore
- The dataset is small to medium
- The dataset is large (hierarchical is O(n²)+)
- Clusters are roughly round and equal-sized
- You already know roughly how many groups