Recursive Generalization Transformer for Image Super-Resolution (2303.06373v4)
Abstract: Transformer architectures have exhibited remarkable performance in image super-resolution (SR). Since the quadratic computational complexity of the self-attention (SA) in Transformer, existing methods tend to adopt SA in a local region to reduce overheads. However, the local design restricts the global context exploitation, which is crucial for accurate image reconstruction. In this work, we propose the Recursive Generalization Transformer (RGT) for image SR, which can capture global spatial information and is suitable for high-resolution images. Specifically, we propose the recursive-generalization self-attention (RG-SA). It recursively aggregates input features into representative feature maps, and then utilizes cross-attention to extract global information. Meanwhile, the channel dimensions of attention matrices (query, key, and value) are further scaled to mitigate the redundancy in the channel domain. Furthermore, we combine the RG-SA with local self-attention to enhance the exploitation of the global context, and propose the hybrid adaptive integration (HAI) for module integration. The HAI allows the direct and effective fusion between features at different levels (local or global). Extensive experiments demonstrate that our RGT outperforms recent state-of-the-art methods quantitatively and qualitatively. Code and pre-trained models are available at https://github.com/zhengchen1999/RGT.
- Xcit: Cross-covariance image transformers. In NeurIPS, 2021.
- Low-complexity single-image super-resolution based on nonnegative neighbor embedding. In BMVC, 2012.
- Regionvit: Regional-to-local attention for vision transformers. In ICLR, 2022a.
- Activating more pixels in image super-resolution transformer. arXiv preprint arXiv:2205.04437, 2022b.
- Cross aggregation transformer for image restoration. In NeurIPS, 2022c.
- Dual aggregation transformer for image super-resolution. In ICCV, 2023.
- Twins: Revisiting the design of spatial attention in vision transformers. In NeurIPS, 2021.
- Algorithms for learning kernels based on centered alignment. JMLR, 2012.
- Second-order attention network for single image super-resolution. In CVPR, 2019.
- Learning a deep convolutional network for image super-resolution. In ECCV, 2014.
- Cswin transformer: A general vision transformer backbone with cross-shaped windows. In CVPR, 2022.
- An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021.
- Deep residual learning for image recognition. In CVPR, 2016.
- Single image super-resolution from transformed self-exemplars. In CVPR, 2015.
- Accurate image super-resolution using very deep convolutional networks. In CVPR, 2016.
- Adam: A method for stochastic optimization. In ICLR, 2015.
- Similarity of neural network representations revisited. In NeurIPS, 2019.
- Efficient and explicit modelling of image hierarchies for image restoration. In CVPR, 2023.
- Feedback network for image super-resolution. In CVPR, 2019.
- Swinir: Image restoration using swin transformer. In ICCVW, 2021.
- Enhanced deep residual networks for single image super-resolution. In CVPRW, 2017.
- Residual feature aggregation network for image super-resolution. In CVPR, 2020.
- Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV, 2021.
- Dynamic high-pass filtering and multi-spectral attention for image super-resolution. In ICCV, 2021.
- A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In ICCV, 2001.
- Sketch-based manga retrieval using manga109 dataset. Multimedia Tools and Applications, 2017.
- Image super-resolution with cross-scale non-local attention and exhaustive self-exemplars mining. In CVPR, 2020.
- Image super-resolution with non-local sparse attention. In CVPR, 2021.
- Single image super-resolution via a holistic attention network. In ECCV, 2020.
- Pytorch: An imperative style, high-performance deep learning library. In NeurIPS, 2019.
- Do vision transformers see like convolutional neural networks? In NeurIPS, 2021.
- Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In CVPR, 2016.
- Ntire 2017 challenge on single image super-resolution: Methods and results. In CVPRW, 2017.
- Stripformer: Strip transformer for fast image deblurring. In ECCV, 2022.
- Maxvit: Multi-axis vision transformer. In ECCV, 2022.
- Attention is all you need. In NeurIPS, 2017.
- Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In ICCV, 2021.
- Uformer: A general u-shaped transformer for image restoration. In CVPR, 2022.
- Image quality assessment: from error visibility to structural similarity. TIP, 2004.
- Segformer: Simple and efficient design for semantic segmentation with transformers. In NeurIPS, 2021.
- Scalablevit: Rethinking the context-oriented generalization of vision transformer. In ECCV, 2022.
- Restormer: Efficient transformer for high-resolution image restoration. In CVPR, 2022.
- On single image scale-up using sparse-representations. In Proc. 7th Int. Conf. Curves Surf., 2010.
- Efficient long-range attention network for image super-resolution. In ECCV, 2022.
- Image super-resolution using very deep residual channel attention networks. In ECCV, 2018a.
- Residual dense network for image super-resolution. In CVPR, 2018b.
- Residual non-local attention networks for image restoration. In ICLR, 2019.
- Context reasoning attention network for image super-resolution. In ICCV, 2021.
- Cross-scale internal graph neural network for image super-resolution. In NeurIPS, 2020.
- Zheng Chen (221 papers)
- Yulun Zhang (167 papers)
- Jinjin Gu (56 papers)
- Linghe Kong (44 papers)
- Xiaokang Yang (207 papers)