Uformer: A General U-Shaped Transformer for Image Restoration (2106.03106v2)

Published 6 Jun 2021 in cs.CV

Abstract: In this paper, we present Uformer, an effective and efficient Transformer-based architecture for image restoration, in which we build a hierarchical encoder-decoder network using the Transformer block. In Uformer, there are two core designs. First, we introduce a novel locally-enhanced window (LeWin) Transformer block, which performs nonoverlapping window-based self-attention instead of global self-attention. It significantly reduces the computational complexity on high resolution feature map while capturing local context. Second, we propose a learnable multi-scale restoration modulator in the form of a multi-scale spatial bias to adjust features in multiple layers of the Uformer decoder. Our modulator demonstrates superior capability for restoring details for various image restoration tasks while introducing marginal extra parameters and computational cost. Powered by these two designs, Uformer enjoys a high capability for capturing both local and global dependencies for image restoration. To evaluate our approach, extensive experiments are conducted on several image restoration tasks, including image denoising, motion deblurring, defocus deblurring and deraining. Without bells and whistles, our Uformer achieves superior or comparable performance compared with the state-of-the-art algorithms. The code and models are available at https://github.com/ZhendongWang6/Uformer.

PDF Abstract

Uformer: A Detailed Examination of Transformer-Based Image Restoration

The research paper titled "Uformer: A General U-Shaped Transformer for Image Restoration" presents a novel architecture designed to improve the capabilities of image restoration, traditionally dominated by ConvNet-based methods. This paper introduces the Uformer, a U-shaped Transformer-based network that addresses both local and global dependencies in feature maps for diverse image restoration tasks.

Core Contributions

The Uformer possesses two pivotal innovations:

Locally-Enhanced Window (LeWin) Transformer Block:
- This block performs self-attention within non-overlapping local windows, significantly reducing computational complexity compared to global self-attention.
- The architecture incorporates a depth-wise convolutional layer within the feed-forward network (FFN) component of the Transformer block to enhance locality.
Learnable Multi-Scale Restoration Modulator:
- The modulator is formulated as a multi-scale spatial bias added at multiple layers of the Uformer decoder to adjust features and improve image restoration details.
- This design introduces marginal extra parameters and computational cost but significantly enhances the ability to restore fine details across different image restoration tasks.

Performance Evaluation

The effectiveness of Uformer is assessed through extensive experiments on various image restoration tasks including image denoising, motion deblurring, defocus deblurring, and deraining. Below are some notable results:

Image Denoising:
- On the SIDD dataset, Uformer-B achieves a PSNR of 39.89 dB, outperforming the previous state-of-the-art method NBNet by 0.14 dB.
- On the DND dataset, Uformer-B achieves a PSNR of 39.98 dB, showing a 0.09 dB improvement over NBNet.
Motion Deblurring:
- Uformer achieves a PSNR of 32.97 dB on the GoPro dataset, surpassing previous methods like MPRNet.
- For real-world deblurring on the RealBlur dataset, Uformer achieves a PSNR of 36.22 dB and 29.06 dB on RealBlur-R and RealBlur-J respectively, indicating superior generalization to real scenes.
Defocus Deblurring:
- On the DPD dataset, Uformer outperforms the previous best models by a substantial margin, achieving a PSNR improvement of 1.04 dB over KPAC.
Deraining:
- Uformer achieves significant performance on the SPAD dataset with a PSNR of 47.84 dB, outperforming SPAIR by 3.74 dB.

Theoretical and Practical Implications

From a theoretical standpoint, the Uformer architecture demonstrates that incorporating the Transformer model into the image restoration domain can effectively capture both local and global dependencies, which traditional ConvNets struggle with. The hierarchical structure enables efficient handling of high-resolution images, a critical requirement in image restoration tasks.

Practically, the Uformatter approach introduces a novel method for efficient and high-performing image restoration, which can be broadly applied across multiple types of image degradation. The scalable and modular nature of the Uformer allows it to be adapted to various restoration tasks without significant additional computational overhead.

Future Directions

Potential future developments may include extending the Uformer architecture to other domains within computer vision, such as image generation and super-resolution, to validate its versatility further. Additionally, exploring the integration of Uformer with large-scale pre-trained models could further enhance its performance, especially for complex restoration tasks where data scarcity is a challenge.

In conclusion, the Uformer represents a significant step forward in the deployment of Transformer-based solutions to image restoration. The blend of efficient window-based self-attention and locally-enhanced network components sets a new benchmark in balancing computational efficiency with restoration quality. The learnable multi-scale restoration modulator further accentuates Uformer's utility across various tasks, making it a compelling option for future research and application in image restoration.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Zhendong Wang (60 papers)
Xiaodong Cun (61 papers)
Jianmin Bao (65 papers)
Wengang Zhou (153 papers)
Jianzhuang Liu (90 papers)
Houqiang Li (236 papers)

Citations (1,149)

View on Semantic Scholar

Uformer: A General U-Shaped Transformer for Image Restoration (2106.03106v2)

Uformer: A Detailed Examination of Transformer-Based Image Restoration

Core Contributions

Performance Evaluation

Theoretical and Practical Implications

Future Directions

Related Papers

GitHub

YouTube