Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with Diffusion Models (2310.07702v1)

Published 11 Oct 2023 in cs.CV

Abstract: In this work, we investigate the capability of generating images from pre-trained diffusion models at much higher resolutions than the training image sizes. In addition, the generated images should have arbitrary image aspect ratios. When generating images directly at a higher resolution, 1024 x 1024, with the pre-trained Stable Diffusion using training images of resolution 512 x 512, we observe persistent problems of object repetition and unreasonable object structures. Existing works for higher-resolution generation, such as attention-based and joint-diffusion approaches, cannot well address these issues. As a new perspective, we examine the structural components of the U-Net in diffusion models and identify the crucial cause as the limited perception field of convolutional kernels. Based on this key observation, we propose a simple yet effective re-dilation that can dynamically adjust the convolutional perception field during inference. We further propose the dispersed convolution and noise-damped classifier-free guidance, which can enable ultra-high-resolution image generation (e.g., 4096 x 4096). Notably, our approach does not require any training or optimization. Extensive experiments demonstrate that our approach can address the repetition issue well and achieve state-of-the-art performance on higher-resolution image synthesis, especially in texture details. Our work also suggests that a pre-trained diffusion model trained on low-resolution images can be directly used for high-resolution visual generation without further tuning, which may provide insights for future research on ultra-high-resolution image and video synthesis.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Yingqing He (23 papers)
  2. Shaoshu Yang (4 papers)
  3. Haoxin Chen (12 papers)
  4. Xiaodong Cun (61 papers)
  5. Menghan Xia (33 papers)
  6. Yong Zhang (660 papers)
  7. Xintao Wang (132 papers)
  8. Ran He (172 papers)
  9. Qifeng Chen (187 papers)
  10. Ying Shan (252 papers)
Citations (47)