Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild (2401.13627v2)

Published 24 Jan 2024 in cs.CV

Abstract: We introduce SUPIR (Scaling-UP Image Restoration), a groundbreaking image restoration method that harnesses generative prior and the power of model scaling up. Leveraging multi-modal techniques and advanced generative prior, SUPIR marks a significant advance in intelligent and realistic image restoration. As a pivotal catalyst within SUPIR, model scaling dramatically enhances its capabilities and demonstrates new potential for image restoration. We collect a dataset comprising 20 million high-resolution, high-quality images for model training, each enriched with descriptive text annotations. SUPIR provides the capability to restore images guided by textual prompts, broadening its application scope and potential. Moreover, we introduce negative-quality prompts to further improve perceptual quality. We also develop a restoration-guided sampling method to suppress the fidelity issue encountered in generative-based restoration. Experiments demonstrate SUPIR's exceptional restoration effects and its novel capacity to manipulate restoration through textual prompts.

References (103)

Citations (67)

View on Semantic Scholar

Summary

The paper introduces SUPIR, a scalable image restoration framework that uses SDXL and extensive high-quality data to boost perceptual quality.
It employs a robust architecture with a degradation-resistant encoder and ZeroSFT connector to reduce computational load while enhancing output fidelity.
SUPIR uses restoration-guided sampling with textual prompts, outperforming state-of-the-art methods on qualitative perceptual metrics in real-world scenarios.

Scaling Up to Excellence: Model Scaling for Photo-Realistic Image Restoration

The paper entitled "Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild" introduces SUPIR, a model that leverages the principles of scaling to advance the field of image restoration (IR). By harnessing vast datasets and employing sophisticated generative models, SUPIR aims to enhance both the visual fidelity and the perceptual quality of restored images.

Introduction and Motivation

With the progression in image restoration, the demand for refined perceptual quality and intelligent processing of IR results has risen. Traditional methods grounded in generative priors have seen substantial enhancements by incorporating high-quality generative models. However, further optimizing these methods requires advancing the model scaling techniques. The authors argue that the current state of scaling in IR lacks the engineering feasibility due to constraints like computing resources and architecture design.

Core Approach and Methodology

Generative Prior and Degradation-Robust Encoder

SUPIR employs the StableDiffusion-XL (SDXL) model as its generative prior. SDXL facilitates the efficient generation of high-resolution images. By fine-tuning an encoder to be robust against degradations, SUPIR ensures the consistency of the latent representation when dealing with low-quality (LQ) images.

Extensive Data Collection

A critical component of SUPIR's training is a bespoke high-quality dataset composed of 20 million high-resolution images, each annotated with detailed text descriptions. This scale and quality of data are unprecedented in the IR domain. The dataset also includes an additional set of 70k high-resolution facial images to bolster the model's ability to restore faces effectively and 100k low-quality images generated using the SDXL model to understand negative-quality concepts better.

Model Scaling and Adaptor Design

To effectively utilize the SDXL model within an IR framework, the authors designed a scalable adaptor using the essential paradigm of ControlNet. By trimming half of the Vision Transformer blocks and introducing the ZeroSFT connector, they managed to lower the computational burden while ensuring that the model could effectively handle the high parameters of SDXL.

Restoration-Guided Sampling and Textual Prompts

One of the innovative contributions of SUPIR is the integration of textual prompts in guiding the restoration process. By adopting LLaVA multi-modal LLMs, SUPIR can understand and leverage the textual descriptions of images. Furthermore, the model employs a novel restoration-guided sampling strategy to ensure that the generated content remains faithful to the LQ input. By dynamically adjusting the restoration guidance using a hyper-parameter during the diffusion process, the proposed method attains a balance between fidelity and quality.

Experimental Results

Comparison with State-of-the-Art Methods

The evaluation on several datasets shows that SUPIR outperforms existing methods such as BSRGAN, Real-ESRGAN, StableSR, DiffBIR, and PASD. The qualitative analyses demonstrate SUPIR's unprecedented ability to restore textures and details accurately in various degraded images. Though SUPIR's numerical performance on traditional full-reference metrics like PSNR and SSIM may not always surpass others, its excellence in non-reference metrics like ManIQA, ClipIQA, and MUSIQ reflects its strength in perceptual quality, aligning more closely with human judgments.

Restoration in Real-World Scenarios

Evaluated on real-world LQ images, SUPIR not only achieves superior qualitative performance but also ranks highly in user studies, thus solidifying its practical applicability in diverse contexts, from landscape to portrait images.

Ablation Studies and Analysis

Impact of Training Data and Model Architecture

Ablation studies reveal the significance of large-scale high-quality training data and the effectiveness of the ZeroSFT connector. Comparisons with models trained on smaller datasets such as DIV2K and LSDIR underscore the necessity of expansive datasets to achieve the desired performance.

Negative Prompts and Quality Analysis

The introduction of negative prompts through classifier-free guidance significantly enhances visual quality. The authors also highlight that without negative-quality samples in the training phase, the model risks generating artifacts when processing low-quality inputs.

Implications and Future Directions

SUPIR poses substantial implications for both theoretical research and practical applications. By advancing the integration of model scaling techniques with sophisticated generative models and extensive datasets, SUPIR sets a new standard for image restoration tasks. The ability to control restoration through textual prompts offers a novel avenue for user interaction in image editing. Future developments might explore extending this framework to accommodate video restoration and expanding the multi-modal aspects to incorporate more diverse data types.

Conclusion

The authors present a comprehensive approach to scaling in image restoration, leveraging the capacities of large model architectures and extensive datasets. Their method, SUPIR, demonstrates significant progress in achieving intelligent and photo-realistic image restoration, with vast potential for future innovations in AI-driven image enhancement technologies.

Overall, this paper is a pivotal contribution to the field, opening avenues for both new research and practical applications, pushing the boundaries towards achieving unparalleled image restoration quality.