DBSCAN Clustering

ML clustering density unsupervised

Clusters are dense neighbourhoods

K-Means needs you to pick K and assumes round, evenly-sized blobs. DBSCAN (Density-Based Spatial Clustering of Applications with Noise) asks a different question: where are points packed tightly together? It grows clusters from dense regions, finds any shape, picks the number of clusters itself, and leaves sparse points unassigned as noise.

ε (epsilon) the radius

How close two points must be to count as neighbours.

minPts the crowd size

How many neighbours within ε a point needs to be a dense "core".

No K found, not set

The number of clusters falls out of the density — you never specify it.

Core, border, noise

Each point is labelled by what's around it: a core point has ≥ minPts neighbours within ε; a border point is near a core but not dense itself; a noise point is alone. Clusters grow by chaining core points together.

DBSCAN vs K-Means

K-Means
  • You must choose K up front
  • Assumes round, similar-size clusters
  • Every point forced into a cluster
DBSCAN
  • Discovers the cluster count from density
  • Handles crescents, rings, any shape
  • Labels outliers as noise instead of forcing them
The catch

DBSCAN struggles when clusters have very different densities (one ε can't fit all), and choosing ε takes care. Variants like HDBSCAN relax the single-ε assumption.

Feel that catch yourself. Below are two interleaved crescents, a blob and four outliers — a shape K-Means can never separate. The sliders run real DBSCAN on every change: big points with halos are cores, smaller ones are borders, grey ones are noise. The dashed circle shows the current ε for scale.

Shrink ε toward 15 and watch everything dissolve into grey noise; grow it past ~75 and the two crescents fuse into one cluster. The sweet spot in between finds all three shapes and rejects the outliers — and notice the crescents stay crescent-shaped, which K-Means cannot do.