Introduction
A variational autoencoder (VAE) is a generative model that learns a latent representation of data. It combines an encoder, a decoder, and a probabilistic regularization term that shapes the latent space.
VAEs are useful for:
- Generating samples.
- Learning compressed representations.
- Interpolating between examples.
- Modeling uncertainty in latent variables.
Autoencoder vs VAE
A standard autoencoder maps an input to a latent vector and reconstructs the input:
input -> encoder -> latent vector -> decoder -> reconstruction
A VAE maps an input to a distribution over latent variables, usually a Gaussian:
input -> encoder -> mean and variance -> sample z -> decoder -> reconstruction
This makes the latent space smoother and easier to sample from.
Latent Variable View
The VAE assumes data is generated from latent variables:
$$ z \sim p(z) $$
$$ x \sim p_\theta(x \mid z) $$
The prior $p(z)$ is often a standard normal distribution. The decoder learns $p_\theta(x \mid z)$.
The encoder approximates the posterior:
$$ q_\phi(z \mid x) $$
because the true posterior is usually intractable.
Objective
The VAE optimizes the evidence lower bound (ELBO):
$$ ELBO = \mathbb{E}{q\phi(z \mid x)}[\log p_\theta(x \mid z)]
KL(q_\phi(z \mid x) | p(z)) $$
The first term encourages accurate reconstruction. The KL term keeps the learned latent distribution close to the prior.
In practice, the loss is often written as:
loss = reconstruction_loss + KL_regularization
Reparameterization Trick
Sampling from $q_\phi(z \mid x)$ would normally block gradient flow. The reparameterization trick rewrites sampling as:
$$ z = \mu + \sigma \odot \epsilon,\quad \epsilon \sim N(0, I) $$
Now gradients can flow through $\mu$ and $\sigma$.
Practical Notes
VAEs can produce smooth latent spaces, but samples may be blurrier than samples from GANs or diffusion models in image tasks.
Common issues:
- Posterior collapse, where the decoder ignores the latent variable.
- Poor reconstruction if the KL term is too strong.
- Latent dimensions that are hard to interpret.
Common fixes:
- KL annealing.
- Beta-VAE objectives.
- Stronger encoder or decoder design.
- Careful latent dimensionality selection.
Closing
VAEs are a bridge between deep learning and probabilistic modeling. The central idea is simple: learn a latent distribution that can reconstruct data while remaining structured enough to sample from.