Gaussian Mixture Models · Suman Bhadra Notes

Soft clusters, not hard borders

K-Means draws a hard line: every point belongs to exactly one cluster. But real groups overlap. A Gaussian Mixture Model assumes the data was generated by a few overlapping bell-shaped (Gaussian) blobs, and gives each point a probability of belonging to each one — "70% cluster A, 30% cluster B".

Each cluster a Gaussian

Described by a centre (mean), a spread and orientation (covariance), and a weight (how big it is).

Membership a probability

Points near a boundary are honestly shared between clusters, not forced one way.

Shape ellipses

Covariance lets blobs stretch and tilt — K-Means is stuck with circles.

Fitting with EM

The Expectation–Maximization algorithm alternates two steps until it settles: the E-step assigns soft responsibilities given the current Gaussians; the M-step moves and reshapes each Gaussian to fit the points it now owns.

GMM vs K-Means

K-Means

Hard assignment — one cluster each
Circular, equal-size clusters only
(It's actually a special case of GMM)

GMM

Soft, probabilistic memberships
Elliptical, tilted, different-size blobs
Gives a likelihood you can use for anomaly scores

Same EM idea, everywhere

The E-step / M-step dance — guess hidden assignments, then update parameters, repeat — is a general recipe for models with latent variables, well beyond clustering.