Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Upsample Guidance: Scale Up Diffusion Models without Training (2404.01709v1)

Published 2 Apr 2024 in cs.CV and cs.AI

Abstract: Diffusion models have demonstrated superior performance across various generative tasks including images, videos, and audio. However, they encounter difficulties in directly generating high-resolution samples. Previously proposed solutions to this issue involve modifying the architecture, further training, or partitioning the sampling process into multiple stages. These methods have the limitation of not being able to directly utilize pre-trained models as-is, requiring additional work. In this paper, we introduce upsample guidance, a technique that adapts pretrained diffusion model (e.g., $5122$) to generate higher-resolution images (e.g., $15362$) by adding only a single term in the sampling process. Remarkably, this technique does not necessitate any additional training or relying on external models. We demonstrate that upsample guidance can be applied to various models, such as pixel-space, latent space, and video diffusion models. We also observed that the proper selection of guidance scale can improve image quality, fidelity, and prompt alignment.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Juno Hwang (4 papers)
  2. Yong-Hyun Park (8 papers)
  3. Junghyo Jo (36 papers)
Citations (7)
Reddit Logo Streamline Icon: https://streamlinehq.com