Papers
Topics
Authors
Recent
AI Research Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 77 tok/s
Gemini 2.5 Pro 52 tok/s Pro
GPT-5 Medium 30 tok/s Pro
GPT-5 High 31 tok/s Pro
GPT-4o 91 tok/s Pro
Kimi K2 178 tok/s Pro
GPT OSS 120B 385 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

Simple Drop-in LoRA Conditioning on Attention Layers Will Improve Your Diffusion Model (2405.03958v3)

Published 7 May 2024 in cs.CV, cs.AI, and cs.LG

Abstract: Current state-of-the-art diffusion models employ U-Net architectures containing convolutional and (qkv) self-attention layers. The U-Net processes images while being conditioned on the time embedding input for each sampling step and the class or caption embedding input corresponding to the desired conditional generation. Such conditioning involves scale-and-shift operations to the convolutional layers but does not directly affect the attention layers. While these standard architectural choices are certainly effective, not conditioning the attention layers feels arbitrary and potentially suboptimal. In this work, we show that simply adding LoRA conditioning to the attention layers without changing or tuning the other parts of the U-Net architecture improves the image generation quality. For example, a drop-in addition of LoRA conditioning to EDM diffusion model yields FID scores of 1.91/1.75 for unconditional and class-conditional CIFAR-10 generation, improving upon the baseline of 1.97/1.79.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (59)
  1. Brian D.O. Anderson. Reverse-time diffusion equation models. Stochastic Processes and their Applications, 1982.
  2. Low-rank updates of pre-trained weights for multi-task learning. ACL, 2023.
  3. All are worth words: A ViT backbone for diffusion models. CVPR, 2023.
  4. Multi-head adapter routing for cross-task generalization. NeurIPS, 2023.
  5. Rich Caruana. Multitask learning. Machine Learning, 1997.
  6. Adaptformer: Adapting vision transformers for scalable visual recognition. NeurIPS, 2022.
  7. Li Deng. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine, 2012.
  8. Diffusion models beat GANs on image synthesis. NeurIPS, 2021.
  9. An image is worth 16x16 words: Transformers for image recognition at scale. ICLR, 2021.
  10. Masked diffusion transformer is a strong image synthesizer. ICCV, 2023.
  11. Towards practical plug-and-play diffusion models. CVPR, 2023.
  12. Pareesa Ameneh Golnari. LoRA-Enhanced distillation on guided diffusion models. arXiv preprint arXiv:2312.06899, 2023.
  13. Listen, think, and understand. ICLR, 2024.
  14. Efficiently modeling long sequences with structured state spaces. ICLR, 2022.
  15. Mix-of-show: Decentralized low-rank adaptation for multi-concept customization of diffusion models. NeurIPS, 2023.
  16. DiffiT: Diffusion vision transformers for image generation. arXiv preprint arXiv:2312.02139, 2023.
  17. GANs trained by a two time-scale update rule converge to a local nash equilibrium. NeurIPS, 2017.
  18. Classifier-free diffusion guidance. NeurIPS Workshop on Deep Generative Models and Downstream Applications, 2021.
  19. Denoising diffusion probabilistic models. NeurIPS, 2020.
  20. Simple diffusion: End-to-end diffusion for high resolution images. ICML, 2023.
  21. Parameter-efficient transfer learning for NLP. ICML, 2019.
  22. LoRA: Low-rank adaptation of large language models. ICLR, 2022.
  23. LoraHub: Efficient cross-task generalization via dynamic LoRA composition. arXiv preprint arXiv:2307.13269, 2023.
  24. Scalable adaptive computation for iterative generation. ICML, 2023.
  25. Adaptive mixtures of local experts. Neural Computation, 1991.
  26. Distribution augmentation for generative modeling. ICML, 2020.
  27. A style-based generator architecture for generative adversarial networks. CVPR, 2019.
  28. Elucidating the design space of diffusion-based generative models. NeurIPS, 2022.
  29. Refining generative process with discriminator guidance in score-based diffusion models. ICML, 2023.
  30. Variational diffusion models. NeurIPS, 2021.
  31. Learning multiple layers of features from tiny images. 2009.
  32. Diffusion models already have a semantic latent space. ICLR, 2023.
  33. Dataflowr. https://dataflowr.github.io/website/, 2024.
  34. Vision transformers are parameter-efficient audio-visual learners. CVPR, 2023.
  35. Modeling task relationships in multi-task learning with multi-gate mixture-of-experts. ACM SIGKDD, 2018.
  36. Improved denoising diffusion probabilistic models. ICML, 2021.
  37. St-adapter: Parameter-efficient image-to-video transfer learning. NeurIPS, 2022.
  38. Scalable diffusion models with transformers. ICCV, 2023.
  39. FiLM: Visual reasoning with a general conditioning layer. AAAI, 2018.
  40. AdapterHub: A framework for adapting transformers. EMNLP, 2020.
  41. Sdxl: Improving latent diffusion models for high-resolution image synthesis. ICLR, 2024.
  42. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
  43. High-resolution image synthesis with latent diffusion models. CVPR, 2022.
  44. U-net: Convolutional networks for biomedical image segmentation. MICCAI, 2015.
  45. Simo Ryu. Low-rank adaptation for fast text-to-image diffusion fine-tuning. https://github.com/cloneofsimo/lora, 2023.
  46. Photorealistic text-to-image diffusion models with deep language understanding. NeurIPS, 2022.
  47. DragDiffusion: Harnessing diffusion models for interactive point-based image editing. arXiv preprint arXiv:2306.14435, 2023.
  48. Continual diffusion: Continual customization of text-to-image diffusion with c-lora. arXiv preprint arXiv:2304.06027, 2023.
  49. Deep unsupervised learning using nonequilibrium thermodynamics. ICML, 2015.
  50. Denoising diffusion implicit models. ICLR, 2021a.
  51. Generative modeling by estimating gradients of the data distribution. NeurIPS, 2019.
  52. Score-based generative modeling through stochastic differential equations. ICLR, 2021b.
  53. Attention is all you need. NeurIPS, 2017.
  54. Customizable combination of parameter-efficient modules for multi-task learning. ICLR, 2024.
  55. Multilora: Democratizing lora for better multi-task learning. arXiv preprint arxXiv:2311.11501, 2023a.
  56. ProlificDreamer: High-fidelity and diverse text-to-3D generation with variational score distillation. NeurIPS, 2023b.
  57. Group normalization. ECCV, 2018.
  58. Diffusion models without attention. arXiv preprint arXiv:2311.18257, 2023.
  59. Pushing mixture of experts to the limit: Extremely parameter efficient moe for instruction tuning. ICLR, 2024.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 post and received 0 likes.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube