Pixel-Aware Stable Diffusion for Realistic Image Super-resolution and Personalized Stylization (2308.14469v4)

Published 28 Aug 2023 in cs.CV

Abstract: Diffusion models have demonstrated impressive performance in various image generation, editing, enhancement and translation tasks. In particular, the pre-trained text-to-image stable diffusion models provide a potential solution to the challenging realistic image super-resolution (Real-ISR) and image stylization problems with their strong generative priors. However, the existing methods along this line often fail to keep faithful pixel-wise image structures. If extra skip connections between the encoder and the decoder of a VAE are used to reproduce details, additional training in image space will be required, limiting the application to tasks in latent space such as image stylization. In this work, we propose a pixel-aware stable diffusion (PASD) network to achieve robust Real-ISR and personalized image stylization. Specifically, a pixel-aware cross attention module is introduced to enable diffusion models perceiving image local structures in pixel-wise level, while a degradation removal module is used to extract degradation insensitive features to guide the diffusion process together with image high level information. An adjustable noise schedule is introduced to further improve the image restoration results. By simply replacing the base diffusion model with a stylized one, PASD can generate diverse stylized images without collecting pairwise training data, and by shifting the base model with an aesthetic one, PASD can bring old photos back to life. Extensive experiments in a variety of image enhancement and stylization tasks demonstrate the effectiveness of our proposed PASD approach. Our source codes are available at \url{https://github.com/yangxy/PASD/}.

Authors (5)

Tao Yang (520 papers)
Peiran Ren (28 papers)
Xuansong Xie (69 papers)
Lei Zhang (1689 papers)
Rongyuan Wu (11 papers)

Citations (69)

View on Semantic Scholar

Summary

The paper introduces PASD, a model that incorporates a degradation removal module, pixel-aware cross attention, and high-level feature extraction for enhanced image super-resolution and stylization.
It leverages pre-trained networks like ResNet and CLIP to guide the refinement process, ensuring realistic textures and structural consistency.
Experiments demonstrate that PASD outperforms state-of-the-art methods, achieving superior visual quality and efficient multi-task performance on benchmark datasets.

Introduction to Pixel-Aware Stable Diffusion

The field of image processing and computer graphics constantly seeks to develop more advanced and versatile techniques to improve visual content. Among the advancements, Realistic Image Super-Resolution (Real-ISR) and personalized image stylization have captured significant attention. Real-ISR seeks to enhance the details of low-quality images in a way that appears realistic to human observers, while personalized image stylization aims to apply specific visual styles to images in a consistent and controllable manner. However, existing methods often struggle with preserving natural textures or enabling pixel-specific adjustments. A novel solution, pixel-aware stable diffusion (PASD), is introduced to robustly tackle both Real-ISR and personalized stylization tasks.

Pixel-Aware Stable Diffusion Network

PASD leverages pre-trained stable diffusion models, known for their strong generative capabilities. The network is designed with three key components that enhance its performance:

Degradation Removal Module: This module extracts features from low-quality images that are relatively insensitive to the image’s degradations. It aids the network in focusing on restoring realistic details without the interference of existing image quality issues.
Pixel-Aware Cross Attention (PACA): The PACA module serves as the core of PASD, enabling the network to be perceptive to pixel-level details, providing a level of granularity that past methods lacked. Utilizing an attention mechanism, PACA ensures the consistency of image details with the input, avoiding the generation of artifacts or structural inconsistencies.
High-Level Information Extraction: By employing pretrained networks such as ResNet and CLIP, PASD can also utilize classification, detection, and captioning details to further guide the refinement process, boosting super-resolution performance.

In addition to improving the clarity and perceptual realism of images, PASD can shift its base model to perform stylization tasks, generating a diverse range of appearance modifications without the need for abundant pairwise training data.

Real-ISR and Personalized Stylization Performance

PASD demonstrates exceptional performance on Real-ISR tasks when evaluated on benchmark datasets. It surpasses state-of-the-art models in generating photo-realistic textures and maintaining content fidelity. In the field of personalized stylization, PASD showcases its flexibility by creating distinct visual styles while preserving the structure of the original images. Its ability to handle multiple tasks without additional training makes it a practical choice for real-world applications.

Conclusion

The development of PASD marks a significant step towards achieving versatile image enhancement models capable of realistic detail generation and stylization. PASD showcases a promising direction for future research that could push the boundaries of how AI models can improve and personalize visual content. With the source code publicly available, the research community is equipped to further explore and iterate upon the model's capabilities.