Variational Autoencoders (VAEs)

Gen AI latent space generative sampling

From compressor to generator

A plain autoencoder maps each input to a single point in latent space. The gaps between those points are gibberish, so you can't just invent new data. A VAE fixes this by encoding each input to a distribution (a mean and a spread) instead of a point. Train it right and the whole latent space becomes smooth — every point decodes to something sensible, so you can sample brand-new data.

Encode to μ, σ a blob, not a dot

The encoder outputs a mean μ and standard deviation σ — a little cloud in latent space.

Sample z z ~ N(μ, σ)

Draw a random point from that cloud — the reparameterization trick keeps it trainable.

KL term tidy the space

A penalty pulls all the clouds toward a standard normal, so the space has no holes.

Encode, sample, decode — then generate

Follow one input to a latent distribution, sample a code, decode it back — then see the real payoff: sampling random points from the smooth latent space to generate data the model never saw.

Where VAEs sit

VAE strengths
  • Smooth, structured latent space you can interpolate
  • Stable training (just two loss terms)
  • Encoder + decoder — good for representation learning
Trade-offs
  • Samples are often a bit blurry
  • Less crisp than GANs
  • Beaten on raw image quality by diffusion
The idea echoes everywhere

A smooth, samplable latent space is the foundation modern generative models build on — latent diffusion runs the diffusion process inside a VAE-style latent space for exactly this reason.