Gaussian Mixture Models

ML clustering EM algorithm soft assignment

Soft clusters, not hard borders

K-Means draws a hard line: every point belongs to exactly one cluster. But real groups overlap. A Gaussian Mixture Model assumes the data was generated by a few overlapping bell-shaped (Gaussian) blobs, and gives each point a probability of belonging to each one — "70% cluster A, 30% cluster B".

Each cluster a Gaussian

Described by a centre (mean), a spread and orientation (covariance), and a weight (how big it is).

Membership a probability

Points near a boundary are honestly shared between clusters, not forced one way.

Shape ellipses

Covariance lets blobs stretch and tilt — K-Means is stuck with circles.

Fitting with EM

The Expectation–Maximization algorithm alternates two steps until it settles: the E-step assigns soft responsibilities given the current Gaussians; the M-step moves and reshapes each Gaussian to fit the points it now owns.

GMM vs K-Means

K-Means
  • Hard assignment — one cluster each
  • Circular, equal-size clusters only
  • (It's actually a special case of GMM)
GMM
  • Soft, probabilistic memberships
  • Elliptical, tilted, different-size blobs
  • Gives a likelihood you can use for anomaly scores
Same EM idea, everywhere

The E-step / M-step dance — guess hidden assignments, then update parameters, repeat — is a general recipe for models with latent variables, well beyond clustering.