HAT: Hybrid Attention Transformer for Image Restoration (2309.05239v2)

Published 11 Sep 2023 in cs.CV

Abstract: Transformer-based methods have shown impressive performance in image restoration tasks, such as image super-resolution and denoising. However, we find that these networks can only utilize a limited spatial range of input information through attribution analysis. This implies that the potential of Transformer is still not fully exploited in existing networks. In order to activate more input pixels for better restoration, we propose a new Hybrid Attention Transformer (HAT). It combines both channel attention and window-based self-attention schemes, thus making use of their complementary advantages. Moreover, to better aggregate the cross-window information, we introduce an overlapping cross-attention module to enhance the interaction between neighboring window features. In the training stage, we additionally adopt a same-task pre-training strategy to further exploit the potential of the model for further improvement. Extensive experiments have demonstrated the effectiveness of the proposed modules. We further scale up the model to show that the performance of the SR task can be greatly improved. Besides, we extend HAT to more image restoration applications, including real-world image super-resolution, Gaussian image denoising and image compression artifacts reduction. Experiments on benchmark and real-world datasets demonstrate that our HAT achieves state-of-the-art performance both quantitatively and qualitatively. Codes and models are publicly available at https://github.com/XPixelGroup/HAT.

PDF Abstract

Hybrid Attention Transformer for Image Restoration

The paper introduces a novel approach named Hybrid Attention Transformer (HAT) aimed at improving image restoration tasks. The authors tackle existing limitations in Transformer-based methods, particularly their restricted usage of spatial information. HAT integrates channel attention with window-based self-attention, enhancing the interaction across window boundaries via an overlapping cross-attention module. This design aims to exploit more input pixels effectively, thereby improving image restoration quality.

Key Contributions

Integration of Attention Mechanisms: The Hybrid Attention Transformer combines channel attention with self-attention mechanisms in a window-based context. This blending leverages global information to complement self-attention's local feature representation, thereby expanding the range of utilized information for improved reconstruction.
Overlapping Cross-Attention Module: The introduction of an overlapping cross-attention module addresses the inefficiencies of the shift window mechanism found in standard Transformer architectures. By including overlapping windows, the method enhances cross-window information integration, leading to superior feature aggregation.
Same-Task Pre-Training Strategy: Unlike prior approaches that employ multi-task pre-training, HAT employs a same-task pre-training strategy on a large dataset specific to the task at hand. This strategy effectively harnesses a Transformer’s potential, resulting in significant performance improvements across various benchmarks.
Extensive Benchmarking: HAT's performance is evaluated against state-of-the-art methods in tasks like image super-resolution, Gaussian image denoising, and compression artifacts reduction. It consistently achieves superior results both quantitatively and qualitatively.

Results and Implications

HAT demonstrates a pronounced improvement in image restoration tasks, surpassing existing state-of-the-art methods. The performance metrics reveal substantial gains (0.3dB to 1.2dB) across multiple datasets. The ability to engage more pixels for reconstruction translates to enhanced visual quality, with fewer artifacts and clearer textures.

The model's integration of diverse attention mechanisms and a targeted pre-training strategy provide insights into addressing spatial information limitations in Transformers. This could guide future developments in incorporating multi-level attention structures for broader vision tasks.

Future Prospects: The methodology has potential applications beyond image restoration; its principles might be adapted for other areas in computer vision and AI where effective information integration across regions is critical. HAT presents a stepping stone toward more sophisticated neural architectures that fully exploit the capabilities of attention mechanisms at scale. Further exploration could involve scaling the model, modifying attention strategies, or integrating additional data modalities to enhance performance across various domains.

Conclusion

In conclusion, the Hybrid Attention Transformer introduces an innovative architectural approach that addresses key challenges in image restoration via an effective combination of attention mechanisms. This paper provides compelling evidence that such integrative designs, alongside strategic pre-training, significantly advance the state-of-the-art in image restoration tasks.

PDF Markdown Bookmark Chat (Pro)

Authors (7)

Xiangyu Chen (84 papers)
Xintao Wang (132 papers)
Wenlong Zhang (93 papers)
Xiangtao Kong (13 papers)
Yu Qiao (563 papers)
Jiantao Zhou (61 papers)
Chao Dong (168 papers)

Citations (32)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - XPixelGroup/HAT: CVPR2023 - Activating More Pixels in Image Super-Resolution Transformer Arxiv - HAT: Hybrid Attention Transformer for Image Restoration (1,117 stars)