DemoFusion: Democratising High-Resolution Image Generation With No $$$ (2311.16973v2)

Published 24 Nov 2023 in cs.CV, cs.AI, and cs.LG

Abstract: High-resolution image generation with Generative Artificial Intelligence (GenAI) has immense potential but, due to the enormous capital investment required for training, it is increasingly centralised to a few large corporations, and hidden behind paywalls. This paper aims to democratise high-resolution GenAI by advancing the frontier of high-resolution generation while remaining accessible to a broad audience. We demonstrate that existing Latent Diffusion Models (LDMs) possess untapped potential for higher-resolution image generation. Our novel DemoFusion framework seamlessly extends open-source GenAI models, employing Progressive Upscaling, Skip Residual, and Dilated Sampling mechanisms to achieve higher-resolution image generation. The progressive nature of DemoFusion requires more passes, but the intermediate results can serve as "previews", facilitating rapid prompt iteration.

Authors (5)

Ruoyi Du (17 papers)
Dongliang Chang (25 papers)
Timothy Hospedales (101 papers)
Yi-Zhe Song (120 papers)
Zhanyu Ma (103 papers)

Citations (29)

View on Semantic Scholar

Summary

The paper introduces DemoFusion, a framework that enhances latent diffusion models to produce images exceeding 4096 pixels using accessible hardware.
It employs three pioneering techniques—Progressive Upscaling, Skip Residual, and Dilated Sampling—to maintain global consistency and enrich image detail.
DemoFusion enables rapid low-resolution previews for prompt iteration while managing longer runtimes and minor artifacts in high-resolution outputs.

Democratizing High-Resolution Image Generation with DemoFusion

In the rapidly evolving field of Generative Artificial Intelligence (GenAI), creating high-resolution imagery has been an area of immense interest and potential. Unfortunately, the computational requirements to train models capable of such feats are substantial, often leaving individuals and academic institutions lagging behind large corporations that can afford such investments. Consequently, open-source GenAI models for high-resolution image generation have become a privilege rather than a widely accessible resource.

Enter DemoFusion—a groundbreaking framework that addresses this imbalance. DemoFusion is designed to extend the capabilities of existing Latent Diffusion Models (LDMs) without necessitating further expensive training or excessive memory. What makes this framework remarkable is its ability to generate images exceeding 4096 pixels in resolution, which is a significant leap from previous models like SDXL that capped at 1024 pixels. This capability is harnessed using a standard RTX 3090 GPU—a 'working class' hardware in the field of GenAI—making it accessible to a broader user base.

Central to DemoFusion are three pioneering techniques: Progressive Upscaling, Skip Residual, and Dilated Sampling. Progressive Upscaling works iteratively, refining the image multiple times, enhancing resolution with each pass. Meanwhile, Skip Residual uses intermediate, noise-inverted images from the previous resolution as guides to maintain global consistency and semantic coherency at higher resolutions. Dilated Sampling ensures that even as details are added, they intuitively fit within the broader context of the global image.

Despite these advantages, DemoFusion is not without trade-offs. The primary one is the increased runtime required to generate high-resolution images. This can be mitigated by the framework’s ability to provide low-resolution ‘previews’ swiftly, allowing users to iterate prompts quickly before committing to the full high-resolution rendering process.

The practical applications of DemoFusion are promising, demonstrated in various scenarios ranging from enhancing realism in digital art to updating older images to match modern display resolutions. It does, however, have limitations that are important to address. DemoFusion can occasionally produce localized irrational content or small repetitive elements in certain contexts—hurdles that future iterations and updates could potentially overcome.

In summary, DemoFusion is a significant step toward democratizing high-resolution image generation, aligning perfectly with the ethos of making sophisticated GenAI technologies more available to all. Through seamless plug-and-play compatibility with existing open-source models, DemoFusion unlocks the untapped potential of LDMs, amplifying their resolution capabilities while balancing memory demands and computational efficiency. As such, it represents a notable advance in the field, paving the way for more equitable access and innovation in high-resolution image synthesis.

Related Papers

GitHub

DemoFusion

Tweets

https://twitter.com/109091002/status/1732193236318388338

https://twitter.com/jrogue/status/1751121069346558409

https://twitter.com/jmsunico/status/1744524891473240448

YouTube

Show All Videos