MRIR: Integrating Multimodal Insights for Diffusion-based Realistic Image Restoration (2407.03635v1)

Published 4 Jul 2024 in cs.CV

Abstract: Realistic image restoration is a crucial task in computer vision, and the use of diffusion-based models for image restoration has garnered significant attention due to their ability to produce realistic results. However, the quality of the generated images is still a significant challenge due to the severity of image degradation and the uncontrollability of the diffusion model. In this work, we delve into the potential of utilizing pre-trained stable diffusion for image restoration and propose MRIR, a diffusion-based restoration method with multimodal insights. Specifically, we explore the problem from two perspectives: textual level and visual level. For the textual level, we harness the power of the pre-trained multimodal LLM to infer meaningful semantic information from low-quality images. Furthermore, we employ the CLIP image encoder with a designed Refine Layer to capture image details as a supplement. For the visual level, we mainly focus on the pixel level control. Thus, we utilize a Pixel-level Processor and ControlNet to control spatial structures. Finally, we integrate the aforementioned control information into the denoising U-Net using multi-level attention mechanisms and realize controllable image restoration with multimodal insights. The qualitative and quantitative results demonstrate our method's superiority over other state-of-the-art methods on both synthetic and real-world datasets.

Authors (6)

Yuhong Zhang (27 papers)
Hengsheng Zhang (6 papers)
Xinning Chai (7 papers)
Rong Xie (24 papers)
Li Song (72 papers)
Wenjun Zhang (160 papers)

Citations (1)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/CSVisionPapers/status/1810355781134320063

MRIR: Integrating Multimodal Insights for Diffusion-based Realistic Image Restoration (2407.03635v1)

Summary

Related Papers

Tweets