Degradation-Aware Self-Attention Based Transformer for Blind Image Super-Resolution (2310.04180v1)
Abstract: Compared to CNN-based methods, Transformer-based methods achieve impressive image restoration outcomes due to their abilities to model remote dependencies. However, how to apply Transformer-based methods to the field of blind super-resolution (SR) and further make an SR network adaptive to degradation information is still an open problem. In this paper, we propose a new degradation-aware self-attention-based Transformer model, where we incorporate contrastive learning into the Transformer network for learning the degradation representations of input images with unknown noise. In particular, we integrate both CNN and Transformer components into the SR network, where we first use the CNN modulated by the degradation information to extract local features, and then employ the degradation-aware Transformer to extract global semantic features. We apply our proposed model to several popular large-scale benchmark datasets for testing, and achieve the state-of-the-art performance compared to existing methods. In particular, our method yields a PSNR of 32.43 dB on the Urban100 dataset at $\times$2 scale, 0.94 dB higher than DASR, and 26.62 dB on the Urban100 dataset at $\times$4 scale, 0.26 dB improvement over KDSR, setting a new benchmark in this area. Source code is available at: https://github.com/I2-Multimedia-Lab/DSAT/tree/main.
- W. Yang, X. Zhang, Y. Tian, W. Wang, J.-H. Xue, and Q. Liao, “Deep learning for single image super-resolution: A brief review,” IEEE Trans. Multimedia, vol. 21, no. 12, pp. 3106–3121, Dec. 2019.
- Z. Li, J. Yang, Z. Liu, X. Yang, G. Jeon, and W. Wu, “Feedback network for image super-resolution,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Long Beach, California, Jun. 2019, pp. 3867–3876.
- M. Fritsche, S. Gu, and R. Timofte, “Frequency separation for real-world super-resolution,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. Workshop. Seoul, Korea: IEEE, 2019, pp. 3599–3608.
- K. Zhang, Y. Li, W. Zuo, L. Zhang, L. Van Gool, and R. Timofte, “Plug-and-play image restoration with deep denoiser prior,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 10, pp. 6360–6376, Oct. 2021.
- K. Zhang, J. Liang, L. Van Gool, and R. Timofte, “Designing a practical degradation model for deep blind image super-resolution,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., Virtual, Oct. 2021, pp. 4791–4800.
- J. Kim, J. K. Lee, and K. M. Lee, “Accurate image super-resolution using very deep convolutional networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Las Vegas, Nevada, Jun. 2016, pp. 1646–1654.
- J. Gu, H. Lu, W. Zuo, and C. Dong, “Blind super-resolution with iterative kernel correction,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Long Beach, California, Jun. 2019, pp. 1604–1613.
- Z. Luo, H. Huang, L. Yu, Y. Li, H. Fan, and S. Liu, “Deep constrained least squares for blind image super-resolution,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., New Orleans, Louisiana, Jun. 2022, pp. 17642–17652.
- L. Wang, Y. Wang, X. Dong, Q. Xu, J. Yang, W. An, and Y. Guo, “Unsupervised degradation representation learning for blind super-resolution,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Virtual, Jun. 2021, pp. 10581–10590.
- B. Xia, Y. Zhang, Y. Wang, Y. Tian, W. Yang, R. Timofte, and L. Van Gool, “Knowledge distillation based degradation estimation for blind super-resolution,” in ICLR, 2023, pp. 1–12.
- J. Gu and C. Dong, “Interpreting super-resolution networks with local attribution maps,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Virtual, Jun. 2021, pp. 9199–9208.
- Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., Virtual, Oct. 2021, pp. 10012–10022.
- J. Liang, J. Cao, G. Sun, K. Zhang, L. Van Gool, and R. Timofte, “Swinir: Image restoration using swin transformer,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., Virtual, Oct. 2021, pp. 1833–1844.
- K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised visual representation learning,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Virtual, Jun. 2020, pp. 9729–9738.
- X. Wang, K. Yu, S. Wu, J. Gu, Y. Liu, C. Dong, Y. Qiao, and C. C. Loy, “Esrgan: Enhanced super-resolution generative adversarial networks,” in Proc. Eur. Conf. Comput. Vis. Workshops, L. Leal-Taixé and S. Roth, Eds. Cham: Springer International Publishing, Sep. 2019, pp. 63–79.
- B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee, “Enhanced deep residual networks for single image super-resolution,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops, 2017, pp. 136–144.
- T. Tong, G. Li, X. Liu, and Q. Gao, “Image super-resolution using dense skip connections,” in Proc. IEEE Int. Conf. Comput. Vis., Venice, Italy, Oct. 2017, pp. 4799–4807.
- J. Kim, J. K. Lee, and K. M. Lee, “Deeply-recursive convolutional network for image super-resolution,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Las Vegas, Nevada, Jun. 2016, pp. 1637–1645.
- Y. Tai, J. Yang, and X. Liu, “Image super-resolution via deep recursive residual network,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Honolulu, Hawaii, Jul. 2017, pp. 3147–3155.
- Y. Zhang, K. Li, K. Li, L. Wang, B. Zhong, and Y. Fu, “Image super-resolution using very deep residual channel attention networks,” in Proc. Eur. Conf. Comput. Vis., Munich, Germany, Sep. 2018, pp. 286–301.
- T. Dai, J. Cai, Y. Zhang, S.-T. Xia, and L. Zhang, “Second-order attention network for single image super-resolution,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Long Beach, California, Jun. 2019, pp. 11065–11074.
- W. Shi, J. Caballero, F. Huszár, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang, “Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Las Vegas, Nevada, Jun. 2016, pp. 1874–1883.
- J.-S. Yoo, D.-W. Kim, Y. Lu, and S.-W. Jung, “Rzsr: Reference-based zero-shot super-resolution with depth guided self-exemplars,” IEEE Trans. Multimedia, pp. 1–13, 2022, to be published.
- H. Qi, Y. Qiu, X. Luo, and Z. Jin, “An efficient latent style guided transformer-cnn framework for face super-resolution,” IEEE Trans. Multimedia, pp. 1–11, 2023, to be published.
- N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in Proc. Eur. Conf. Comput. Vis. Virtual: Springer, Nov. 2020, pp. 213–229.
- L. Liu, W. Ouyang, X. Wang, P. Fieguth, J. Chen, X. Liu, and M. Pietikäinen, “Deep learning for generic object detection: A survey,” Int. J. Comput. Vis., vol. 128, no. 2, pp. 261–318, 2020.
- B. Wu, C. Xu, X. Dai, A. Wan, P. Zhang, Z. Yan, M. Tomizuka, J. Gonzalez, K. Keutzer, and P. Vajda, “Visual transformers: Token-based image representation and processing for computer vision,” arXiv preprint arXiv:2006.03677, 2020.
- S. Zheng, J. Lu, H. Zhao, X. Zhu, Z. Luo, Y. Wang, Y. Fu, J. Feng, T. Xiang, P. H. Torr et al., “Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Virtual, Jun. 2021, pp. 6881–6890.
- H. Cao, Y. Wang, J. Chen, D. Jiang, X. Zhang, Q. Tian, and M. Wang, “Swin-unet: Unet-like pure transformer for medical image segmentation,” arXiv preprint arXiv:2105.05537, 2021.
- D. Liang, X. Chen, W. Xu, Y. Zhou, and X. Bai, “Transcrowd: weakly-supervised crowd counting with transformers,” Science China Information Sciences, vol. 65, no. 6, pp. 1–14, Apr. 2022.
- G. Sun, Y. Liu, T. Probst, D. P. Paudel, N. Popovic, and L. Van Gool, “Boosting crowd counting with transformers,” arXiv preprint arXiv:2105.10926, 2021.
- H. Chen, Y. Wang, T. Guo, C. Xu, Y. Deng, Z. Liu, S. Ma, C. Xu, C. Xu, and W. Gao, “Pre-trained image processing transformer,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Virtual, Jun. 2021, pp. 12299–12310.
- Z. Wang, X. Cun, J. Bao, W. Zhou, J. Liu, and H. U. Li, “A general u-shaped transformer for image restoration,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., New Orleans, Louisiana, 2022, pp. 17683–17693.
- J. Cao, Y. Li, K. Zhang, and L. Van Gool, “Video super-resolution transformer,” arXiv preprint arXiv:2106.06847, 2021.
- J. Shi, Y. Wang, Z. Yu, G. Li, X. Hong, F. Wang, and Y. Gong, “Exploiting multi-scale parallel self-attention and local variation via dual-branch transformer-cnn structure for face super-resolution,” IEEE Trans. Multimedia, pp. 1–14, 2023, to be published.
- Y. Tian, D. Krishnan, and P. Isola, “Contrastive multiview coding,” in Proc. Eur. Conf. Comput. Vis. Virtual: Springer, 2020, pp. 776–794.
- T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” in Int. Conf. Mach. Learn. Virtual: PMLR, 2020, pp. 1597–1607.
- S. Bell-Kligler, A. Shocher, and M. Irani, “Blind super-resolution kernel estimation using an internal-gan,” Adv. Neural Inform. Process. Syst., vol. 32, 2019.
- K. Zhang, L. V. Gool, and R. Timofte, “Deep unfolding network for image super-resolution,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Virtual, Jun. 2020, pp. 3217–3226.
- C. Dyer, “Notes on noise contrastive estimation and negative sampling,” arXiv preprint arXiv:1410.8251, 2014.
- K. Zhang, W. Zuo, and L. Zhang, “Learning a single convolutional super-resolution network for multiple degradations,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Salt Lake City, Utah, Jun. 2018, pp. 3262–3271.
- J. W. Soh, S. Cho, and N. I. Cho, “Meta-transfer learning for zero-shot super-resolution,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Virtual, Jun. 2020, pp. 3516–3525.