Compressing Deep Image Super-resolution Models (2401.00523v2)
Abstract: Deep learning techniques have been applied in the context of image super-resolution (SR), achieving remarkable advances in terms of reconstruction performance. Existing techniques typically employ highly complex model structures which result in large model sizes and slow inference speeds. This often leads to high energy consumption and restricts their adoption for practical applications. To address this issue, this work employs a three-stage workflow for compressing deep SR models which significantly reduces their memory requirement. Restoration performance has been maintained through teacher-student knowledge distillation using a newly designed distillation loss. We have applied this approach to two popular image super-resolution networks, SwinIR and EDSR, to demonstrate its effectiveness. The resulting compact models, SwinIRmini and EDSRmini, attain an 89% and 96% reduction in both model size and floating-point operations (FLOPs) respectively, compared to their original versions. They also retain competitive super-resolution performance compared to their original models and other commonly used SR approaches. The source code and pre-trained models for these two lightweight SR approaches are released at https://pikapi22.github.io/CDISM/.
- Z. Wang, J. Chen, and S. C. Hoi, “Deep learning for image super-resolution: A survey,” IEEE transactions on pattern analysis and machine intelligence, vol. 43, no. 10, pp. 3365–3387, 2020.
- D. Bull and F. Zhang, Intelligent image and video compression: communicating pictures. Academic Press, 2021.
- M. Afonso, F. Zhang, and D. R. Bull, “Video compression based on spatio-temporal resolution adaptation,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 29, no. 1, pp. 275–280, 2018.
- F. Zhang, M. Afonso, and D. R. Bull, “ViSTRA2: Video coding using spatial resolution and effective bit depth adaptation,” Signal Processing: Image Communication, vol. 97, p. 116355, 2021.
- J. Yang, J. Wright, T. S. Huang, and Y. Ma, “Image super-resolution via sparse representation,” IEEE transactions on image processing, vol. 19, no. 11, pp. 2861–2873, 2010.
- S. Schulter, C. Leistner, and H. Bischof, “Fast and accurate image upscaling with super-resolution forests,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3791–3799, 2015.
- M. Afonso, F. Zhang, A. Katsenou, D. Agrafiotis, and D. Bull, “Low complexity video coding based on spatial resolution adaptation,” in 2017 IEEE International Conference on Image Processing (ICIP), pp. 3011–3015, IEEE, 2017.
- C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution using deep convolutional networks,” IEEE transactions on pattern analysis and machine intelligence, vol. 38, no. 2, pp. 295–307, 2015.
- J. Kim, J. K. Lee, and K. M. Lee, “Accurate image super-resolution using very deep convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1646–1654, 2016.
- B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee, “Enhanced deep residual networks for single image super-resolution,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 136–144, 2017.
- Y. Zhang, K. Li, K. Li, L. Wang, B. Zhong, and Y. Fu, “Image super-resolution using very deep residual channel attention networks,” in Proceedings of the European conference on computer vision (ECCV), pp. 286–301, 2018.
- D. Ma, F. Zhang, and D. R. Bull, “CVEGAN: a perceptually-inspired gan for compressed video enhancement,” arXiv preprint arXiv:2011.09190, 2020.
- D. Ma, F. Zhang, and D. R. Bull, “MFRNet: a new CNN architecture for post-processing and in-loop filtering,” IEEE Journal of Selected Topics in Signal Processing, vol. 15, no. 2, pp. 378–387, 2020.
- J. Liang, J. Cao, Y. Fan, K. Zhang, R. Ranjan, Y. Li, R. Timofte, and L. Van Gool, “VRT: A video restoration transformer,” arXiv preprint arXiv:2201.12288, 2022.
- Z. Wang, X. Cun, J. Bao, W. Zhou, J. Liu, and H. Li, “Uformer: A general u-shaped transformer for image restoration,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 17683–17693, 2022.
- J. Liang, J. Cao, G. Sun, K. Zhang, L. Van Gool, and R. Timofte, “SwinIR: Image restoration using swin transformer,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 1833–1844, 2021.
- M. V. Conde, U.-J. Choi, M. Burchi, and R. Timofte, “Swin2SR: Swinv2 transformer for compressed image super-resolution and restoration,” in European Conference on Computer Vision, pp. 669–687, Springer, 2022.
- E. Zamfir, M. V. Conde, and R. Timofte, “Towards real-time 4k image super-resolution,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1522–1532, 2023.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
- J. Guo, X. Zou, Y. Chen, Y. Liu, J. Liu, Y. Yan, and J. Hao, “AsConvSR: Fast and lightweight super-resolution network with assembled convolutions,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1582–1592, 2023.
- X. Chu, B. Zhang, H. Ma, R. Xu, and Q. Li, “Fast, accurate and lightweight super-resolution with neural architecture search,” in 2020 25th International conference on pattern recognition (ICPR), pp. 59–64, IEEE, 2021.
- Y. Cheng, D. Wang, P. Zhou, and T. Zhang, “A survey of model compression and acceleration for deep neural networks,” arXiv preprint arXiv:1710.09282, 2017.
- M. Zhu and S. Gupta, “To prune, or not to prune: exploring the efficacy of pruning for model compression,” arXiv preprint arXiv:1710.01878, 2017.
- F. Kong, M. Li, S. Liu, D. Liu, J. He, Y. Bai, F. Chen, and L. Fu, “Residual local feature network for efficient super-resolution,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 766–776, 2022.
- G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” arXiv preprint arXiv:1503.02531, 2015.
- P. Chen, S. Liu, H. Zhao, and J. Jia, “Distilling knowledge via knowledge review,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5008–5017, 2021.
- Y. Jin, J. Wang, and D. Lin, “Multi-level logit distillation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 24276–24285, 2023.
- L. Beyer, X. Zhai, A. Royer, L. Markeeva, R. Anil, and A. Kolesnikov, “Knowledge Distillation: A good teacher is patient and consistent,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10925–10934, 2022.
- H. Fang, X. Hu, and H. Hu, “Cross knowledge distillation for image super-resolution,” in Proceedings of the 2022 6th International Conference on Video and Image Processing, pp. 162–168, 2022.
- H. M. Kwan, G. Gao, F. Zhang, A. Gower, and D. Bull, “HiNeRV: Video compression with hierarchical encoding based neural representation,” arXiv preprint arXiv:2306.09818, 2023.
- T. Ding, L. Liang, Z. Zhu, and I. Zharkov, “CDFI: Compression-driven network design for frame interpolation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8001–8011, 2021.
- C. Morris, D. Danier, F. Zhang, N. Anantrasirichai, and D. R. Bull, “ST-MFNet Mini: Knowledge distillation-driven frame interpolation,” arXiv preprint arXiv:2302.08455, 2023.
- T. Chen, T. Ding, B. Ji, G. Wang, Y. Shi, J. Tian, S. Yi, X. Tu, and Z. Zhu, “Orthant based proximal stochastic gradient method for ℓℓ\ellroman_ℓ _1 ℓℓ\ellroman_ℓ 1-regularized optimization,” in Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2020, Ghent, Belgium, September 14–18, 2020, Proceedings, Part III, pp. 57–73, Springer, 2021.
- S. Niklaus and F. Liu, “Context-aware synthesis for video frame interpolation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1701–1710, 2018.
- E. Agustsson and R. Timofte, “NTIRE 2017 challenge on single image super-resolution: Dataset and study,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 126–135, 2017.
- M. Bevilacqua, A. Roumy, C. Guillemot, and M. L. Alberi-Morel, “Low-complexity single-image super-resolution based on nonnegative neighbor embedding,” 2012.
- R. Zeyde, M. Elad, and M. Protter, “On single image scale-up using sparse-representations,” in Curves and Surfaces: 7th International Conference, Avignon, France, June 24-30, 2010, Revised Selected Papers 7, pp. 711–730, Springer, 2012.
- Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image Quality Assessment: from error visibility to structural similarity,” IEEE transactions on image processing, vol. 13, no. 4, pp. 600–612, 2004.
- W.-S. Lai, J.-B. Huang, N. Ahuja, and M.-H. Yang, “Deep laplacian pyramid networks for fast and accurate super-resolution,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 624–632, 2017.
- N. Ahn, B. Kang, and K.-A. Sohn, “Fast, accurate, and lightweight super-resolution with cascading residual network,” in Proceedings of the European conference on computer vision (ECCV), pp. 252–268, 2018.
- X. Luo, Y. Xie, Y. Zhang, Y. Qu, C. Li, and Y. Fu, “LatticeNet: Towards lightweight image super-resolution with lattice block,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXII 16, pp. 272–289, Springer, 2020.
- Z. Hui, X. Gao, Y. Yang, and X. Wang, “Lightweight image super-resolution with information multi-distillation network,” in Proceedings of the 27th acm international conference on multimedia, pp. 2024–2032, 2019.