DBSCAN Clustering · Suman Bhadra Notes

Clusters are dense neighbourhoods

K-Means needs you to pick K and assumes round, evenly-sized blobs. DBSCAN (Density-Based Spatial Clustering of Applications with Noise) asks a different question: where are points packed tightly together? It grows clusters from dense regions, finds any shape, picks the number of clusters itself, and leaves sparse points unassigned as noise.

ε (epsilon) the radius

How close two points must be to count as neighbours.

minPts the crowd size

How many neighbours within ε a point needs to be a dense "core".

No K found, not set

The number of clusters falls out of the density — you never specify it.

Core, border, noise

Each point is labelled by what's around it: a core point has at least minPts points within ε (counting itself); a border point is near a core but not dense itself; a noise point is neither a core nor a border point — too sparse to be a core and not close enough to any core to be pulled in. Clusters grow by chaining core points together.

DBSCAN vs K-Means

K-Means

You must choose K up front
Assumes round, similar-size clusters
Every point forced into a cluster

DBSCAN

Discovers the cluster count from density
Handles crescents, rings, any shape
Labels outliers as noise instead of forcing them

The catch

DBSCAN struggles when clusters have very different densities (one ε can't fit all), and choosing ε takes care. Variants like HDBSCAN relax the single-ε assumption.

Feel that catch yourself. Below are two interleaved crescents, a blob and four outliers — a shape K-Means can never separate. The sliders run real DBSCAN on every change: big points with halos are cores, smaller ones are borders, grey ones are noise. The dashed circle shows the current ε for scale.

ε minPts

Shrink ε toward 15 and watch nearly everything dissolve into grey noise (at minPts = 3 one tight knot of four blob points holds out; nudge minPts up to clear it); grow it past ~70 and the two crescents fuse into one cluster. The sweet spot in between finds all three shapes and rejects the outliers — and notice the crescents stay crescent-shaped, which K-Means cannot do.