Learning A Sparse Transformer Network for Effective Image Deraining (2303.11950v1)

Published 21 Mar 2023 in cs.CV

Abstract: Transformers-based methods have achieved significant performance in image deraining as they can model the non-local information which is vital for high-quality image reconstruction. In this paper, we find that most existing Transformers usually use all similarities of the tokens from the query-key pairs for the feature aggregation. However, if the tokens from the query are different from those of the key, the self-attention values estimated from these tokens also involve in feature aggregation, which accordingly interferes with the clear image restoration. To overcome this problem, we propose an effective DeRaining network, Sparse Transformer (DRSformer) that can adaptively keep the most useful self-attention values for feature aggregation so that the aggregated features better facilitate high-quality image reconstruction. Specifically, we develop a learnable top-k selection operator to adaptively retain the most crucial attention scores from the keys for each query for better feature aggregation. Simultaneously, as the naive feed-forward network in Transformers does not model the multi-scale information that is important for latent clear image restoration, we develop an effective mixed-scale feed-forward network to generate better features for image deraining. To learn an enriched set of hybrid features, which combines local context from CNN operators, we equip our model with mixture of experts feature compensator to present a cooperation refinement deraining scheme. Extensive experimental results on the commonly used benchmarks demonstrate that the proposed method achieves favorable performance against state-of-the-art approaches. The source code and trained models are available at https://github.com/cschenxiang/DRSformer.

Citations (158)

View on Semantic Scholar

Summary

The paper introduces DRSformer featuring a Sparse Transformer Block with top-k sparse attention and a mixed-scale feed-forward network for effective rain removal.
It employs an adaptive top-k selection operator that filters irrelevant features, yielding improved PSNR and SSIM scores on benchmarks like Rain200L, Rain200H, and SPA-Data.
The integration of adaptive expert feature compensators and multi-scale processing demonstrates its potential for enhancing visual clarity in adverse weather conditions.

Learning A Sparse Transformer Network for Effective Image Deraining

The paper "Learning A Sparse Transformer Network for Effective Image Deraining" focuses on addressing the challenges posed by traditional convolutional neural networks (CNNs) and Transformer-based models in efficiently removing rain streaks from images. The paper introduces the DeRaining Sparse Transformer Network (DRSformer), which integrates a novel sparse attention mechanism to improve image restoration by selectively aggregating useful features while discarding irrelevant information.

Key Contributions

DRSformer is introduced as a sophisticated alteration of the traditional Transformer architecture, specifically targeting the improvement of image deraining performance. The key contributions of the paper are:

Sparse Transformer Block (STB): At the core of DRSformer is the STB, which includes the Top-K Sparse Attention (TKSA) and the Mixed-Scale Feed-Forward Network (MSFN). The integration of these elements effectively balances the global and local feature representations, essential for high-quality image restoration.
Top-K Sparse Attention (TKSA): Unlike standard self-attention mechanisms that consider all token similarities, TKSA employs a top-k selection operator to filter and retain only the most impactful self-attention scores. This selective attention reduces the interference from irrelevant features, enhancing the clarity of reconstructed images. The top-k selection is adaptive, facilitating dynamic learning of attentional focus based on spatial data dependencies.
Mixed-Scale Feed-Forward Network (MSFN): To capture multi-scale information, essential for resolving fine details in derained images, the paper proposes a mixed-scale approach. It leverages convolutions of varying kernel sizes to enrich multi-scale feature extraction, thereby bolstering image restoration capabilities.
Mixture of Experts Feature Compensator (MEFC): Enabling adaptive refinement, MEFC employs diverse expert CNN operations to offer compensatory feature transformations. These operations are applied at strategic stages to enrich feature diversity, further aiding detail recovery in restored images.

Performance and Results

Evaluated on several benchmark datasets, including Rain200L, Rain200H, and real-world SPA-Data, DRSformer consistently outperforms many state-of-the-art methods in terms of PSNR and SSIM values. Notably, the paper highlights the superior performance over prominent CNN-based models such as MPRNet and Transformer-based models like Restormer, especially in scenarios with complex rain streaks. Additionally, qualitative assessments showcase the ability of DRSformer to remove rain artifacts while preserving image textures effectively.

Implications and Future Directions

The implications of this research extend into practical applications where visual clarity is paramount, such as in autonomous driving, surveillance systems, and outdoor photography under adverse weather conditions. By addressing the inefficiencies in dense attention mechanisms through sparse representation, DRSformer sets a precedent for further explorations into sparsity-driven Transformers in various image restoration tasks.

For future work, exploration could focus on further optimizing the efficiency of the sparse attention mechanism and investigating its applicability across other weather-induced degradations, such as fog or snow, to cultivate a robust, generalizable solution for adverse weather vision systems. Moreover, the pipeline efficiency concerning model size and computation could be explored to enhance real-time applicability.

In conclusion, the paper presents a meticulous paper into leveraging sparsity in Transformers to refine image deraining, offering substantial improvements over existing methodologies and paving the way for novel low-level vision solutions.

PDF Markdown

Related Papers

GitHub

GitHub - cschenxiang/DRSformer: Learning A Sparse Transformer Network for Effective Image Deraining (CVPR 2023) (231 stars)