Multi-Scale Deformable Alignment and Content-Adaptive Inference for Flexible-Rate Bi-Directional Video Compression (2306.16544v1)
Abstract: The lack of ability to adapt the motion compensation model to video content is an important limitation of current end-to-end learned video compression models. This paper advances the state-of-the-art by proposing an adaptive motion-compensation model for end-to-end rate-distortion optimized hierarchical bi-directional video compression. In particular, we propose two novelties: i) a multi-scale deformable alignment scheme at the feature level combined with multi-scale conditional coding, ii) motion-content adaptive inference. In addition, we employ a gain unit, which enables a single model to operate at multiple rate-distortion operating points. We also exploit the gain unit to control bit allocation among intra-coded vs. bi-directionally coded frames by fine tuning corresponding models for truly flexible-rate learned video coding. Experimental results demonstrate state-of-the-art rate-distortion performance exceeding those of all prior art in learned video coding.
- G. J. Sullivan, J. Ohm, W. Han, and T. Wiegand, “Overview of the high efficiency video coding (hevc) standard,” IEEE Trans. on Circuits and Systems for Video Tech., vol. 22, no. 12, pp. 1649–1668, Dec 2012.
- B. Bross, Y.-K. Wang, Y. Ye, S. Liu, J. Chen, G. J. Sullivan, and J.-R. Ohm, “Overview of the versatile video coding (vvc) standard and its applications,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 10, pp. 3736–3764, 2021.
- B. Bross, J. Chen, S. Liu, and Y.-K. Wang, “Versatile video coding (draft 10),” Joint Video Experts Team (JVET) of ITU-T SG16 WP3 and ISO/IEC JTC 1/SC29, Output Document JVET-S2001, 2020.
- E. Agustsson, D. Minnen, N. Johnston, J. Ballé, S. J. Hwang, and G. Toderici, “Scale-space flow for end-to-end optimized video compression,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 8500–8509.
- O. Rippel, A. G. Anderson, K. Tatwawadi, S. Nair, C. Lytle, and L. Bourdev, “Elf-vc: Efficient learned flexible-rate video coding,” 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021.
- T. Ladune, P. Philippe, W. Hamidouche, L. Zhang, and O. Déforges, “Conditional coding for flexible learned video compression,” in Neural Compression: From Information Theory to Applications – ICLR Workshop, 2021.
- J. Li, B. Li, and Y. Lu, “Deep contextual video compression,” Advances in Neural Information Processing Systems, vol. 34, 2021.
- Z. Hu, G. Lu, J. Guo, S. Liu, W. Jiang, and D. Xu, “Coarse-to-fine deep video coding with hyperprior-guided mode prediction,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 5911–5920.
- R. Yang, F. Mentzer, L. Van Gool, and R. Timofte, “Learning for video compression with hierarchical quality and recurrent enhancement,” in IEEE/CVF Conf. on Computer Vision and Patt. Recog. (CVPR), 2020.
- M. A. Yılmaz and A. M. Tekalp, “End-to-end rate-distortion optimized learned hierarchical bi-directional video compression,” IEEE Transactions on Image Processing, vol. 31, pp. 974–983, 2022.
- E. Çetin, M. A. Yılmaz, and A. M. Tekalp, “Flexible-rate learned hierarchical bi-directional video compression with motion refinement and frame-level bit allocation,” in 2022 IEEE International Conference on Image Processing (ICIP), 2022, pp. 1206–1210.
- Z. Cui, J. Wang, S. Gao, T. Guo, Y. Feng, and B. Bai, “Asymmetric gained deep image compression with continuous rate adaptation,” in IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), June 2021, pp. 10 532–10 541.
- X. Zhu, H. Hu, S. Lin, and J. Dai, “Deformable convnets v2: More deformable, better results,” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9300–9308, 2018.
- J. Li, B. Li, and Y. Lu, “Hybrid spatial-temporal entropy modelling for neural video compression,” in Proceedings of the 30th ACM International Conference on Multimedia, 2022.
- D. He, Z. Yang, W. Peng, R. Ma, H. Qin, and Y. Wang, “Elic: Efficient learned image compression with unevenly grouped space-channel contextual adaptive coding,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 5718–5727.
- G. Lu, C. Cai, X. Zhang, L. Chen, W. Ouyang, D. Xu, and Z. Gao, “Content adaptive and error propagation aware deep video compression,” in Computer Vision – ECCV 2020, 2020, pp. 456–472.
- T. Xue, B. Chen, J. Wu, D. Wei, and W. T. Freeman, “Video enhancement with task-oriented flow,” International Journal of Computer Vision (IJCV), vol. 127, no. 8, pp. 1106–1125, 2019.
- A. Mercat, M. Viitanen, and J. Vanne, “Uvg dataset: 50/120fps 4k sequences for video codec analysis and development,” in ACM Multimedia Systems Conference, ser. MMSys ’20, 2020, p. 297–302.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Int. Conf. Learning Representation (ICLR), 2015.
- O. Keleş, M. A. Yilmaz, A. M. Tekalp, C. Korkmaz, and Z. Doğan, “On the computation of psnr for a set of images or video,” 2021 Picture Coding Symposium (PCS), pp. 1–5, 2021.
- H. Wang, W. Gan, S. Hu, J. Y. Lin, L. Jin, L. Song, P. Wang, I. Katsavounidis, A. Aaron, and C.-C. J. Kuo, “Mcl-jcv: A jnd-based h.264/avc video quality assessment dataset,” in 2016 IEEE International Conference on Image Processing (ICIP), 2016, pp. 1509–1513.
- M. Akın Yılmaz (56 papers)
- O. Ugur Ulas (2 papers)
- A. Murat Tekalp (31 papers)