Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 189 tok/s
Gemini 2.5 Pro 46 tok/s Pro
GPT-5 Medium 35 tok/s Pro
GPT-5 High 40 tok/s Pro
GPT-4o 103 tok/s Pro
Kimi K2 207 tok/s Pro
GPT OSS 120B 451 tok/s Pro
Claude Sonnet 4.5 38 tok/s Pro
2000 character limit reached

RISSOLE: Parameter-efficient Diffusion Models via Block-wise Generation and Retrieval-Guidance (2408.17095v2)

Published 30 Aug 2024 in cs.CV and cs.LG

Abstract: Diffusion-based models demonstrate impressive generation capabilities. However, they also have a massive number of parameters, resulting in enormous model sizes, thus making them unsuitable for deployment on resource-constraint devices. Block-wise generation can be a promising alternative for designing compact-sized (parameter-efficient) deep generative models since the model can generate one block at a time instead of generating the whole image at once. However, block-wise generation is also considerably challenging because ensuring coherence across generated blocks can be non-trivial. To this end, we design a retrieval-augmented generation (RAG) approach and leverage the corresponding blocks of the images retrieved by the RAG module to condition the training and generation stages of a block-wise denoising diffusion model. Our conditioning schemes ensure coherence across the different blocks during training and, consequently, during generation. While we showcase our approach using the latent diffusion model (LDM) as the base model, it can be used with other variants of denoising diffusion models. We validate the solution of the coherence problem through the proposed approach by reporting substantive experiments to demonstrate our approach's effectiveness in compact model size and excellent generation quality.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. Layer normalization, 2016.
  2. Semi-parametric neural image synthesis. arXiv preprint arXiv:2204.11824, 2022.
  3. Learning temporal coherence via self-supervision for gan-based video generation. ACM Transactions on Graphics (TOG), 39(4):75–1, 2020.
  4. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  5. Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021.
  6. Patched denoising diffusion models for high-resolution image synthesis, 2023.
  7. Patched denoising diffusion models for high-resolution image synthesis. In The Twelfth International Conference on Learning Representations, 2024.
  8. Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12873–12883, 2021.
  9. Accelerating large-scale inference with anisotropic vector quantization. In International Conference on Machine Learning, pages 3887–3896. PMLR, 2020.
  10. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  11. On fast sampling of diffusion probabilistic models. arXiv preprint arXiv:2106.00132, 2021.
  12. Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV), December 2015.
  13. Zero-shot text-to-image generation. In International Conference on Machine Learning, pages 8821–8831. PMLR, 2021.
  14. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
  15. U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18, pages 234–241. Springer, 2015.
  16. Noise estimation for generative diffusion models. arXiv preprint arXiv:2104.02600, 2021.
  17. Adversarial diffusion distillation, 2023.
  18. Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pages 2256–2265. PMLR, 2015.
  19. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020a.
  20. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020b.
  21. Consistency models. arXiv preprint arXiv:2303.01469, 2023.
  22. Patch diffusion: Faster and more data-efficient training of diffusion models, 2023.
  23. A skeleton-based model for promoting coherence among sentences in narrative story generation. In Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii, editors, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4306–4315, Brussels, Belgium, October-November 2018. Association for Computational Linguistics. 10.18653/v1/D18-1462. URL https://aclanthology.org/D18-1462.
  24. Diffusion models: A comprehensive survey of methods and applications. ACM Comput. Surv., 56(4), nov 2023. ISSN 0360-0300. 10.1145/3626235. URL https://doi.org/10.1145/3626235.
  25. Retrieval-augmented generation for ai-generated content: A survey. arXiv preprint arXiv:2402.19473, 2024.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 3 tweets and received 0 likes.

Upgrade to Pro to view all of the tweets about this paper: