DBSCAN Clustering
Clusters are dense neighbourhoods
K-Means needs you to pick K and assumes round, evenly-sized blobs. DBSCAN (Density-Based Spatial Clustering of Applications with Noise) asks a different question: where are points packed tightly together? It grows clusters from dense regions, finds any shape, picks the number of clusters itself, and leaves sparse points unassigned as noise.
How close two points must be to count as neighbours.
How many neighbours within ε a point needs to be a dense "core".
The number of clusters falls out of the density — you never specify it.
Core, border, noise
Each point is labelled by what's around it: a core point has ≥ minPts neighbours within ε; a border point is near a core but not dense itself; a noise point is alone. Clusters grow by chaining core points together.
DBSCAN vs K-Means
- You must choose K up front
- Assumes round, similar-size clusters
- Every point forced into a cluster
- Discovers the cluster count from density
- Handles crescents, rings, any shape
- Labels outliers as noise instead of forcing them
DBSCAN struggles when clusters have very different densities (one ε can't fit all), and choosing ε takes care. Variants like HDBSCAN relax the single-ε assumption.
Feel that catch yourself. Below are two interleaved crescents, a blob and four outliers — a shape K-Means can never separate. The sliders run real DBSCAN on every change: big points with halos are cores, smaller ones are borders, grey ones are noise. The dashed circle shows the current ε for scale.
Shrink ε toward 15 and watch everything dissolve into grey noise; grow it past ~75 and the two crescents fuse into one cluster. The sweet spot in between finds all three shapes and rejects the outliers — and notice the crescents stay crescent-shaped, which K-Means cannot do.