Variational Autoencoders (VAEs) · Suman Bhadra Notes

From compressor to generator

A plain autoencoder maps each input to a single point in latent space. The gaps between those points are gibberish, so you can't just invent new data. A VAE fixes this by encoding each input to a distribution (a mean and a spread) instead of a point. Train it right and the whole latent space becomes smooth — every point decodes to something sensible, so you can sample brand-new data.

Encode to μ, σ a blob, not a dot

The encoder outputs a mean μ and standard deviation σ — a little cloud in latent space.

Sample z z ~ N(μ, σ²)

Draw a random point from that cloud — the reparameterization trick keeps it trainable.

KL term tidy the space

A penalty pulls all the clouds toward a standard normal, so the space has no holes.

Encode, sample, decode — then generate

Follow one input to a latent distribution, sample a code, decode it back — then see the real payoff: sampling random points from the smooth latent space to generate data the model never saw.

Where VAEs sit

VAE strengths

Smooth, structured latent space you can interpolate
Stable training (just two loss terms)
Encoder + decoder — good for representation learning

Trade-offs

Samples are often a bit blurry
Less crisp than GANs
Beaten on raw image quality by diffusion

The idea echoes everywhere

A smooth, samplable latent space is the foundation modern generative models build on — latent diffusion runs the diffusion process inside a VAE-style latent space for exactly this reason.