Gaussian Mixture Models
Soft clusters, not hard borders
K-Means draws a hard line: every point belongs to exactly one cluster. But real groups overlap. A Gaussian Mixture Model assumes the data was generated by a few overlapping bell-shaped (Gaussian) blobs, and gives each point a probability of belonging to each one — "70% cluster A, 30% cluster B".
Described by a centre (mean), a spread and orientation (covariance), and a weight (how big it is).
Points near a boundary are honestly shared between clusters, not forced one way.
Covariance lets blobs stretch and tilt — K-Means is stuck with circles.
Fitting with EM
The Expectation–Maximization algorithm alternates two steps until it settles: the E-step assigns soft responsibilities given the current Gaussians; the M-step moves and reshapes each Gaussian to fit the points it now owns.
GMM vs K-Means
- Hard assignment — one cluster each
- Circular, equal-size clusters only
- (It's actually a special case of GMM)
- Soft, probabilistic memberships
- Elliptical, tilted, different-size blobs
- Gives a likelihood you can use for anomaly scores
The E-step / M-step dance — guess hidden assignments, then update parameters, repeat — is a general recipe for models with latent variables, well beyond clustering.