Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
88 tokens/sec
GPT-4o
11 tokens/sec
Gemini 2.5 Pro Pro
52 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
10 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
Gemini 2.5 Flash Deprecated
12 tokens/sec
2000 character limit reached

Latent Diffusion Models with Masked AutoEncoders (2507.09984v1)

Published 14 Jul 2025 in cs.CV

Abstract: In spite of remarkable potential of the Latent Diffusion Models (LDMs) in image generation, the desired properties and optimal design of the autoencoders have been underexplored. In this work, we analyze the role of autoencoders in LDMs and identify three key properties: latent smoothness, perceptual compression quality, and reconstruction quality. We demonstrate that existing autoencoders fail to simultaneously satisfy all three properties, and propose Variational Masked AutoEncoders (VMAEs), taking advantage of the hierarchical features maintained by Masked AutoEncoder. We integrate VMAEs into the LDM framework, introducing Latent Diffusion Models with Masked AutoEncoders (LDMAEs). Through comprehensive experiments, we demonstrate significantly enhanced image generation quality and computational efficiency.

Summary

We haven't generated a summary for this paper yet.