Vision Transformers for Single Image Dehazing (2204.03883v1)

Published 8 Apr 2022 in cs.CV

Abstract: Image dehazing is a representative low-level vision task that estimates latent haze-free images from hazy images. In recent years, convolutional neural network-based methods have dominated image dehazing. However, vision Transformers, which has recently made a breakthrough in high-level vision tasks, has not brought new dimensions to image dehazing. We start with the popular Swin Transformer and find that several of its key designs are unsuitable for image dehazing. To this end, we propose DehazeFormer, which consists of various improvements, such as the modified normalization layer, activation function, and spatial information aggregation scheme. We train multiple variants of DehazeFormer on various datasets to demonstrate its effectiveness. Specifically, on the most frequently used SOTS indoor set, our small model outperforms FFA-Net with only 25% #Param and 5% computational cost. To the best of our knowledge, our large model is the first method with the PSNR over 40 dB on the SOTS indoor set, dramatically outperforming the previous state-of-the-art methods. We also collect a large-scale realistic remote sensing dehazing dataset for evaluating the method's capability to remove highly non-homogeneous haze.

Authors (4)

Yuda Song (22 papers)
Zhuqing He (1 paper)
Hui Qian (40 papers)
Xin Du (73 papers)

Citations (384)

View on Semantic Scholar

Summary

The paper introduces DehazeFormer, a novel Vision Transformer variant that enhances image dehazing using innovations like RescaleNorm and SoftReLU.
It achieves high performance with PSNR exceeding 40 dB on the SOTS indoor set while reducing computational cost compared to traditional CNN approaches.
The study also presents the RS-Haze dataset for remote sensing, establishing a new benchmark for evaluating non-homogeneous haze scenarios.

Vision Transformers for Single Image Dehazing

The paper, titled "Vision Transformers for Single Image Dehazing", introduces DehazeFormer, a novel Transformer-based architecture tailored towards enhancing the dehazing process for single images. Traditional methods for image dehazing have heavily relied on Convolutional Neural Networks (CNNs), but this paper explores the application of Vision Transformers (ViTs) in this context. The authors identify key shortcomings in the Swin Transformer when applied to low-level vision tasks like image dehazing and propose modifications to address these.

Key Contributions

DehazeFormer Design:
- The paper introduces DehazeFormer, proposing improvements such as modified normalization layers, activation functions, and spatial information aggregation schemes.
- Rescale Layer Normalization (RescaleNorm) is introduced to preserve inter-patch relativity and reintroduce the statistics of the feature map post-normalization.
- A new non-linear activation function, SoftReLU, is presented to replace GELU, which is argued to be less suitable for image dehazing.
- Shifted window partitioning with reflection padding is proposed to maintain constant window sizes along image edges, addressing limitations of the cyclic shift in Swin Transformers.
High Performance:
- DehazeFormer achieves significant improvements over traditional CNN-based methods. On the SOTS indoor set, DehazeFormer's large model exceeds 40 dB in PSNR, marking it as a leader among image dehazing methods.
Introduction of a Remote Sensing Dataset:
- The authors introduce a synthetic dataset, RS-Haze, to evaluate performance in the context of highly non-homogeneous haze, which is prevalent in remote sensing images.

Results and Evaluation

The paper provides a detailed experimental setup for both training and testing across several datasets, including RESIDE and RS-Haze. The quantitative results showcase that DehazeFormer not only outperforms CNN-based methods but also offers lower parameter requirements and reduced computational costs.

On RESIDE datasets, DehazeFormer consistently shows superior results in PSNR and SSIM when compared to well-established methods like FFA-Net and AECR-Net.
Qualitative comparisons depict DehazeFormer's ability to produce clearer, more artifact-free images.

Practical and Theoretical Implications

The introduction of DehazeFormer offers an alternative to CNN-dominated methods, leveraging the strengths of Vision Transformers. This approach can potentially reshape dehazing methodologies by improving accuracy and efficiency. The architectural innovations suggest broader applicability beyond dehazing, possibly influencing other low-level vision tasks.

Future Directions

The research opens new avenues for extending Transformer architectures to low-level vision tasks. Future investigations could focus on:

Streamlining the model for real-time dehazing applications.
Extending the proposed architectural paradigms to other vision applications beyond dehazing.
Exploring the potential of the introduced RS-Haze dataset for training models specific to remote sensing applications.

The paper underscores the transformative potential of Vision Transformers in image dehazing, presenting a compelling case for their adoption in future research and applications.

PDF Markdown

Related Papers

GitHub

GitHub - IDKiro/DehazeFormer: [IEEE TIP] Vision Transformers for Single Image Dehazing (335 stars)