Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer (2405.04312v2)

Published 7 May 2024 in cs.CV

Abstract: Diffusion models have shown remarkable performance in image generation in recent years. However, due to a quadratic increase in memory during generating ultra-high-resolution images (e.g. 4096*4096), the resolution of generated images is often limited to 1024*1024. In this work. we propose a unidirectional block attention mechanism that can adaptively adjust the memory overhead during the inference process and handle global dependencies. Building on this module, we adopt the DiT structure for upsampling and develop an infinite super-resolution model capable of upsampling images of various shapes and resolutions. Comprehensive experiments show that our model achieves SOTA performance in generating ultra-high-resolution images in both machine and human evaluation. Compared to commonly used UNet structures, our model can save more than 5x memory when generating 4096*4096 images. The project URL is https://github.com/THUDM/Inf-DiT.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Zhuoyi Yang (18 papers)
  2. Heyang Jiang (3 papers)
  3. Wenyi Hong (14 papers)
  4. Jiayan Teng (8 papers)
  5. Wendi Zheng (12 papers)
  6. Yuxiao Dong (119 papers)
  7. Ming Ding (219 papers)
  8. Jie Tang (302 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com