HIPA: Hierarchical Patch Transformer for Single Image Super Resolution (2203.10247v2)
Abstract: Transformer-based architectures start to emerge in single image super resolution (SISR) and have achieved promising performance. Most existing Vision Transformers divide images into the same number of patches with a fixed size, which may not be optimal for restoring patches with different levels of texture richness. This paper presents HIPA, a novel Transformer architecture that progressively recovers the high resolution image using a hierarchical patch partition. Specifically, we build a cascaded model that processes an input image in multiple stages, where we start with tokens with small patch sizes and gradually merge to the full resolution. Such a hierarchical patch mechanism not only explicitly enables feature aggregation at multiple resolutions but also adaptively learns patch-aware features for different image regions, e.g., using a smaller patch for areas with fine details and a larger patch for textureless regions. Meanwhile, a new attention-based position encoding scheme for Transformer is proposed to let the network focus on which tokens should be paid more attention by assigning different weights to different tokens, which is the first time to our best knowledge. Furthermore, we also propose a new multi-reception field attention module to enlarge the convolution reception field from different branches. The experimental results on several public datasets demonstrate the superior performance of the proposed HIPA over previous methods quantitatively and qualitatively.
- Y. Li, Y. Iwamoto, L. Lin, R. Xu, R. Tong, and Y.-W. Chen, “Volumenet: a lightweight parallel network for super-resolution of mr and ct volumetric data,” IEEE Transactions on Image Processing, vol. 30, pp. 4840–4854, 2021.
- W. Wen, W. Ren, Y. Shi, Y. Nie, J. Zhang, and X. Cao, “Video super-resolution via a spatio-temporal alignment network,” IEEE Transactions on Image Processing, vol. 31, pp. 1761–1773, 2022.
- M. R. Arefin, V. Michalski, P.-L. St-Charles, A. Kalaitzis, S. Kim, S. E. Kahou, and Y. Bengio, “Multi-image super-resolution for remote sensing using deep recurrent networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2020.
- K. Zhang, L. V. Gool, and R. Timofte, “Deep unfolding network for image super-resolution,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 3217–3226.
- Q. Cai, J. Li, H. Li, Y.-H. Yang, F. Wu, and D. Zhang, “Tdpn: Texture and detail-preserving network for single image super-resolution,” IEEE Transactions on Image Processing, vol. 31, pp. 2375–2389, 2022.
- S. Anwar and N. Barnes, “Real image denoising with feature attention,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 3155–3164.
- S. Nie, C. Ma, D. Chen, S. Yin, H. Wang, L. Jiao, and F. Liu, “A dual residual network with channel attention for image restoration,” in Proceedings of the European Conference on Computer Vision, 2020, pp. 352–363.
- Q. Wang, Q. Gao, L. Wu, G. Sun, and L. Jiao, “Adversarial multi-path residual network for image super-resolution,” IEEE Transactions on Image Processing, vol. 30, pp. 6648–6658, 2021.
- Y. Zhang, Y. Tian, Y. Kong, B. Zhong, and Y. Fu, “Residual dense network for image super-resolution,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 2472–2481.
- D. Song, C. Xu, X. Jia, Y. Chen, C. Xu, and Y. Wang, “Efficient residual dense block search for image super-resolution,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, 2020, pp. 12 007–12 014.
- X. Hu, M. A. Naiel, A. Wong, M. Lamm, and P. Fieguth, “Runet: A robust unet architecture for image super-resolution,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019, pp. 0–0.
- K. Prajapati, V. Chudasama, H. Patel, A. Sarvaiya, K. P. Upla, K. Raja, R. Ramachandra, and C. Busch, “Channel split convolutional neural network (chasnet) for thermal image super-resolution,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 4368–4377.
- W. Yang, R. T. Tan, J. Feng, J. Liu, Z. Guo, and S. Yan, “Deep joint rain detection and removal from a single image,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1357–1366.
- K. Zhang, W. Zuo, S. Gu, and L. Zhang, “Learning deep cnn denoiser prior for image restoration,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 3929–3938.
- C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang et al., “Photo-realistic single image super-resolution using a generative adversarial network,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 4681–4690.
- X. Wang, K. Yu, S. Wu, J. Gu, Y. Liu, C. Dong, Y. Qiao, and C. Change Loy, “Esrgan: Enhanced super-resolution generative adversarial networks,” in Proceedings of the European Conference on Computer Vision Workshops, 2018, pp. 0–0.
- K. Prajapati, V. Chudasama, H. Patel, K. Upla, K. Raja, R. Ramachandra, and C. Busch, “Direct unsupervised super-resolution using generative adversarial network (dus-gan) for real-world data,” IEEE Transactions on Image Processing, vol. 30, pp. 8251–8264, 2021.
- Z. Wenlong, L. Yihao, C. Dong, and Y. Qiao, “Ranksrgan: Generative adversarial networks with ranker for image super-resolution,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
- Y. Guo, J. Chen, J. Wang, Q. Chen, J. Cao, Z. Deng, Y. Xu, and M. Tan, “Closed-loop matters: Dual regression networks for single image super-resolution,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 5407–5416.
- L. Lu, W. Li, X. Tao, J. Lu, and J. Jia, “Masa-sr: Matching acceleration and spatial adaptation for reference-based image super-resolution,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 6368–6377.
- Y. Guo, X. Wu, and X. Shu, “Data acquisition and preparation for dual-reference deep learning of image super-resolution,” IEEE Transactions on Image Processing, 2022.
- Y. Zhang, D. Wei, C. Qin, H. Wang, H. Pfister, and Y. Fu, “Context reasoning attention network for image super-resolution,” in Proceedings of the IEEE International Conference on Computer Vision, 2021, pp. 4278–4287.
- G. Li, J. Lv, Y. Tian, Q. Dou, C. Wang, C. Xu, and J. Qin, “Transformer-empowered multi-scale contextual matching and aggregation for multi-contrast mri super-resolution,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 20 636–20 645.
- Y. Zhang, K. Li, K. Li, L. Wang, B. Zhong, and Y. Fu, “Image super-resolution using very deep residual channel attention networks,” in Proceedings of the European Conference on Computer Vision, 2018, pp. 286–301.
- S. W. Zamir, A. Arora, S. Khan, M. Hayat, F. S. Khan, M.-H. Yang, and L. Shao, “Cycleisp: Real image restoration via improved data synthesis,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 2696–2705.
- B. Niu, W. Wen, W. Ren, X. Zhang, L. Yang, S. Wang, K. Zhang, X. Cao, and H. Shen, “Single image super-resolution via a holistic attention network,” in Proceedings of the European Conference on Computer Vision, 2020, pp. 191–207.
- J. Liu, W. Zhang, Y. Tang, J. Tang, and G. Wu, “Residual feature aggregation network for image super-resolution,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 2359–2368.
- T. Dai, J. Cai, Y. Zhang, S.-T. Xia, and L. Zhang, “Second-order attention network for single image super-resolution,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 11 065–11 074.
- Y. Zhang, K. Li, K. Li, B. Zhong, and Y. Fu, “Residual non-local attention networks for image restoration,” in International Conference on Learning Representations, 2019.
- Y. Mei, Y. Fan, Y. Zhou, L. Huang, T. S. Huang, and H. Shi, “Image super-resolution with cross-scale non-local attention and exhaustive self-exemplars mining,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 5690–5699.
- Y. Mei, Y. Fan, and Y. Zhou, “Image super-resolution with non-local sparse attention,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 3517–3526.
- B. Xia, Y. Hang, Y. Tian, W. Yang, Q. Liao, and J. Zhou, “Efficient non-local contrastive attention for image super-resolution,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2022.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems, 2017.
- F. Yang, H. Yang, J. Fu, H. Lu, and B. Guo, “Learning texture transformer network for image super-resolution,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 5791–5800.
- H. Chen, Y. Wang, T. Guo, C. Xu, Y. Deng, Z. Liu, S. Ma, C. Xu, C. Xu, and W. Gao, “Pre-trained image processing transformer,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 12 299–12 310.
- J. Cao, Y. Li, K. Zhang, and L. Van Gool, “Video super-resolution transformer,” arXiv preprint arXiv:2106.06847, 2021.
- J. Liang, J. Cao, G. Sun, K. Zhang, L. Van Gool, and R. Timofte, “Swinir: Image restoration using swin transformer,” in Proceedings of the IEEE International Conference on Computer Vision Workshops, 2021.
- Z. Wang, X. Cun, J. Bao, W. Zhou, J. Liu, and H. Li, “Uformer: A general u-shaped transformer for image restoration,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 17 683–17 693.
- X. Chu, Z. Tian, Y. Wang, B. Zhang, H. Ren, X. Wei, H. Xia, and C. Shen, “Twins: Revisiting the design of spatial attention in vision transformers,” Advances in Neural Information Processing Systems, vol. 34, pp. 9355–9366, 2021.
- Y. Wang, R. Huang, S. Song, Z. Huang, and G. Huang, “Not all images are worth 16x16 words: Dynamic transformers for efficient image recognition,” Advances in Neural Information Processing Systems, vol. 34, pp. 11 960–11 973, 2021.
- C. Dong, C. C. Loy, K. He, and X. Tang, “Learning a deep convolutional network for image super-resolution,” in Proceedings of the European Conference on Computer Vision, 2014, pp. 184–199.
- L. Zhang and X. Wu, “An edge-guided image interpolation algorithm via directional filtering and data fusion,” IEEE Transactions on Image Processing, vol. 15, no. 8, pp. 2226–2238, 2006.
- K. Zhang, X. Gao, D. Tao, and X. Li, “Single image super-resolution with non-local means and steering kernel regression,” IEEE Transactions on Image Processing, vol. 21, no. 11, pp. 4544–4556, 2012.
- C. L. P. Chen, L. Liu, L. Chen, Y. Y. Tang, and Y. Zhou, “Weighted couple sparse representation with classified regularization for impulse noise removal,” IEEE Transactions on Image Processing, vol. 24, no. 11, pp. 4014–4026, 2015.
- B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee, “Enhanced deep residual networks for single image super-resolution,” in Proceedings of the IEEE International Conference on Computer Vision Workshops, 2017, pp. 136–144.
- J. Kim, J. K. Lee, and K. M. Lee, “Accurate image super-resolution using very deep convolutional networks,” in Proceedings of the IEEE International Conference on Computer Vision, 2016, pp. 1646–1654.
- C. Dong, C. C. Loy, and X. Tang, “Accelerating the super-resolution convolutional neural network,” in Proceedings of the European Conference on Computer Vision. Springer, 2016, pp. 391–407.
- W. Shi, J. Caballero, F. Huszár, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang, “Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1874–1883.
- J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
- A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. R. Bowman, “Glue: A multi-task benchmark and analysis platform for natural language understanding,” in International Conference on Learning Representations, 2019.
- X. Zhang, H. Zeng, S. Guo, and L. Zhang, “Efficient long-range attention network for image super-resolution,” in Proceedings of the European Conference on Computer Vision, 2022, pp. 649–667.
- T. Huang, X. Yuan, W. Dong, J. Wu, and G. Shi, “Deep gaussian scale mixture prior for image reconstruction,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
- Z. Lu, J. Li, H. Liu, C. Huang, L. Zhang, and T. Zeng, “Transformer for single image super-resolution,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2022, pp. 457–466.
- J. Fang, H. Lin, X. Chen, and K. Zeng, “A hybrid network of cnn and transformer for lightweight image super-resolution,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2022, pp. 1103–1112.
- M. V. Conde, U.-J. Choi, M. Burchi, and R. Timofte, “Swin2sr: Swinv2 transformer for compressed image super-resolution and restoration,” in Proceedings of the European Conference on Computer Vision Workshops. Springer, 2023, pp. 669–687.
- X. Chu, Z. Tian, B. Zhang, X. Wang, X. Wei, H. Xia, and C. Shen, “Conditional positional encodings for vision transformers,” arXiv preprint arXiv:2102.10882, 2021.
- J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7132–7141.
- K. Zhang, W. Zuo, and L. Zhang, “Learning a single convolutional super-resolution network for multiple degradations,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 3262–3271.
- M. Haris, G. Shakhnarovich, and N. Ukita, “Deep back-projection networks for super-resolution,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 1664–1673.
- J. Li, F. Fang, K. Mei, and G. Zhang, “Multi-scale residual network for image super-resolution,” in Proceedings of the European Conference on Computer Vision, 2018, pp. 517–532.
- Z. Li, J. Yang, Z. Liu, X. Yang, G. Jeon, and W. Wu, “Feedback network for image super-resolution,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 3867–3876.
- Y. Fan, J. Yu, Y. Mei, Y. Zhang, Y. Fu, D. Liu, and T. S. Huang, “Neural sparse representation for image restoration,” in Advances in Neural Information Processing Systems, vol. 33, 2020, pp. 15 394–15 404.
- S. Zhou, J. Zhang, W. Zuo, and C. C. Loy, “Cross-scale internal graph neural network for image super-resolution,” in Advances in Neural Information Processing Systems, vol. 33, 2020, pp. 3499–3509.
- Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE International Conference on Computer Vision, 2021, pp. 10 012–10 022.
- E. Agustsson and R. Timofte, “Ntire 2017 challenge on single image super-resolution: Dataset and study,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017, pp. 126–135.
- M. Bevilacqua, A. Roumy, C. Guillemot, and M.-L. A. Morel, “Low-complexity single-image super-resolution based on nonnegative neighbor embedding,” in British Machine Vision Conference, 2012.
- R. Zeyde, M. Elad, and M. Protter, “On single image scale-up using sparse-representations,” in International Conference on Curves and Surfaces, 2010, pp. 711–730.
- D. Martin, C. Fowlkes, D. Tal, and J. Malik, “A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics,” in Proceedings of the IEEE International Conference on Computer Vision, vol. 2, 2001, pp. 416–423.
- J. Huang, A. Singh, and N. Ahuja, “Single image super-resolution from transformed self-exemplars,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 5197–5206.
- Y. Matsui, K. Ito, Y. Aramaki, A. Fujimoto, T. Ogawa, T. Yamasaki, and K. Aizawa, “Sketch-based manga retrieval using manga109 dataset,” Multimedia Tools and Applications, vol. 76, no. 20, pp. 21 811–21 838, 2017.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” in International Conference on Learning Representations, 2021.
- Qing Cai (15 papers)
- Yiming Qian (32 papers)
- Jinxing Li (22 papers)
- Jun Lv (24 papers)
- Yee-Hong Yang (13 papers)
- Feng Wu (198 papers)
- David Zhang (83 papers)