Compositional Discrete Latent Code for High Fidelity, Productive Diffusion Models (2507.12318v1)

Published 16 Jul 2025 in cs.CV, cs.AI, and cs.LG

Abstract: We argue that diffusion models' success in modeling complex distributions is, for the most part, coming from their input conditioning. This paper investigates the representation used to condition diffusion models from the perspective that ideal representations should improve sample fidelity, be easy to generate, and be compositional to allow out-of-training samples generation. We introduce Discrete Latent Code (DLC), an image representation derived from Simplicial Embeddings trained with a self-supervised learning objective. DLCs are sequences of discrete tokens, as opposed to the standard continuous image embeddings. They are easy to generate and their compositionality enables sampling of novel images beyond the training distribution. Diffusion models trained with DLCs have improved generation fidelity, establishing a new state-of-the-art for unconditional image generation on ImageNet. Additionally, we show that composing DLCs allows the image generator to produce out-of-distribution samples that coherently combine the semantics of images in diverse ways. Finally, we showcase how DLCs can enable text-to-image generation by leveraging large-scale pretrained LLMs. We efficiently finetune a text diffusion LLM to generate DLCs that produce novel samples outside of the image generator training distribution.

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Authors (3)

Tweets

https://twitter.com/lavoiems/status/1947666702709428667

https://twitter.com/rosinality/status/1945707535727567204

https://twitter.com/bronzeagepapi/status/1947803339644707134

[2507.12318] Compositional Discrete Latent Code for High Fidelity, Productive Diffusion Models (1 point, 0 comments)

Compositional Discrete Latent Code for High Fidelity, Productive Diffusion Models (2507.12318v1)

Summary

Follow-up Questions

Related Papers

Authors (3)

Tweets

Reddit