Hierarchical Clustering

ML unsupervised clustering dendrogram

A tree of nested groups

Instead of fixing the number of clusters up front like K-Means, hierarchical clustering builds a whole tree of groupings — from every point alone, up to one big cluster containing everything.

The most common variant is agglomerative (bottom-up): start with each point as its own cluster, then repeatedly merge the two closest clusters until only one remains. The record of those merges is a tree called a dendrogram.

Watch the dendrogram build

Six points, each starting solo. Each step merges the closest pair and draws a new link at the height of that merge distance. Then a cut line shows how slicing the tree picks the number of clusters.

The dendrogram is the answer

Height = distance when they merged

The vertical level of each link is how far apart the two clusters were. Low merges = very similar.

Cut to choose K slice horizontally

Draw a horizontal line; the number of branches it crosses is your number of clusters. No need to pick K in advance.

Nested structure groups within groups

You see sub-clusters inside clusters — a full hierarchy, not a flat partition.

Linkage — how to measure cluster distance

Single closest pair

Distance = nearest two points. Can chain into long straggly clusters.

Complete farthest pair

Distance = farthest two points. Makes compact, round clusters.

Average / Ward balanced

Average distance, or merge that least increases variance (Ward). Popular defaults.

Hierarchical vs K-Means

Hierarchical wins when
  • You don't want to pre-commit to K
  • You want a nested hierarchy to explore
  • The dataset is small to medium
K-Means wins when
  • The dataset is large (hierarchical is O(n²)+)
  • Clusters are roughly round and equal-sized
  • You already know roughly how many groups