Papers

Topics

Authors

Recent

View all

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 77 tok/s

Gemini 2.5 Pro 52 tok/s Pro

GPT-5 Medium 30 tok/s Pro

GPT-5 High 31 tok/s Pro

GPT-4o 91 tok/s Pro

Kimi K2 178 tok/s Pro

GPT OSS 120B 385 tok/s Pro

Claude Sonnet 4 38 tok/s Pro

2000 character limit reached

Simple Drop-in LoRA Conditioning on Attention Layers Will Improve Your Diffusion Model (2405.03958v3)

Published 7 May 2024 in cs.CV, cs.AI, and cs.LG

Abstract: Current state-of-the-art diffusion models employ U-Net architectures containing convolutional and (qkv) self-attention layers. The U-Net processes images while being conditioned on the time embedding input for each sampling step and the class or caption embedding input corresponding to the desired conditional generation. Such conditioning involves scale-and-shift operations to the convolutional layers but does not directly affect the attention layers. While these standard architectural choices are certainly effective, not conditioning the attention layers feels arbitrary and potentially suboptimal. In this work, we show that simply adding LoRA conditioning to the attention layers without changing or tuning the other parts of the U-Net architecture improves the image generation quality. For example, a drop-in addition of LoRA conditioning to EDM diffusion model yields FID scores of 1.91/1.75 for unconditional and class-conditional CIFAR-10 generation, improving upon the baseline of 1.97/1.79.

References (59)