Optimal latent distribution choice for two-stage visual generative modeling
Determine which latent distribution structures for the encoder’s aggregate posterior in two-stage visual generative pipelines—where images are compressed into latent codes and a prior over those latents is modeled via diffusion or autoregressive methods—are optimal for subsequent modeling of the latent prior.
Sponsor
References
Yet, existing approaches such as VAEs and foundation model aligned encoders implicitly constrain the latent space without explicitly shaping its distribution, making it unclear which types of distributions are optimal for modeling.
— Distribution Matching Variational AutoEncoder
(2512.07778 - Ye et al., 8 Dec 2025) in Abstract (page 1)