Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multi-Scale Deformable Alignment and Content-Adaptive Inference for Flexible-Rate Bi-Directional Video Compression (2306.16544v1)

Published 28 Jun 2023 in eess.IV and cs.CV

Abstract: The lack of ability to adapt the motion compensation model to video content is an important limitation of current end-to-end learned video compression models. This paper advances the state-of-the-art by proposing an adaptive motion-compensation model for end-to-end rate-distortion optimized hierarchical bi-directional video compression. In particular, we propose two novelties: i) a multi-scale deformable alignment scheme at the feature level combined with multi-scale conditional coding, ii) motion-content adaptive inference. In addition, we employ a gain unit, which enables a single model to operate at multiple rate-distortion operating points. We also exploit the gain unit to control bit allocation among intra-coded vs. bi-directionally coded frames by fine tuning corresponding models for truly flexible-rate learned video coding. Experimental results demonstrate state-of-the-art rate-distortion performance exceeding those of all prior art in learned video coding.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (21)
  1. G. J. Sullivan, J. Ohm, W. Han, and T. Wiegand, “Overview of the high efficiency video coding (hevc) standard,” IEEE Trans. on Circuits and Systems for Video Tech., vol. 22, no. 12, pp. 1649–1668, Dec 2012.
  2. B. Bross, Y.-K. Wang, Y. Ye, S. Liu, J. Chen, G. J. Sullivan, and J.-R. Ohm, “Overview of the versatile video coding (vvc) standard and its applications,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 10, pp. 3736–3764, 2021.
  3. B. Bross, J. Chen, S. Liu, and Y.-K. Wang, “Versatile video coding (draft 10),” Joint Video Experts Team (JVET) of ITU-T SG16 WP3 and ISO/IEC JTC 1/SC29, Output Document JVET-S2001, 2020.
  4. E. Agustsson, D. Minnen, N. Johnston, J. Ballé, S. J. Hwang, and G. Toderici, “Scale-space flow for end-to-end optimized video compression,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 8500–8509.
  5. O. Rippel, A. G. Anderson, K. Tatwawadi, S. Nair, C. Lytle, and L. Bourdev, “Elf-vc: Efficient learned flexible-rate video coding,” 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021.
  6. T. Ladune, P. Philippe, W. Hamidouche, L. Zhang, and O. Déforges, “Conditional coding for flexible learned video compression,” in Neural Compression: From Information Theory to Applications – ICLR Workshop, 2021.
  7. J. Li, B. Li, and Y. Lu, “Deep contextual video compression,” Advances in Neural Information Processing Systems, vol. 34, 2021.
  8. Z. Hu, G. Lu, J. Guo, S. Liu, W. Jiang, and D. Xu, “Coarse-to-fine deep video coding with hyperprior-guided mode prediction,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 5911–5920.
  9. R. Yang, F. Mentzer, L. Van Gool, and R. Timofte, “Learning for video compression with hierarchical quality and recurrent enhancement,” in IEEE/CVF Conf. on Computer Vision and Patt. Recog. (CVPR), 2020.
  10. M. A. Yılmaz and A. M. Tekalp, “End-to-end rate-distortion optimized learned hierarchical bi-directional video compression,” IEEE Transactions on Image Processing, vol. 31, pp. 974–983, 2022.
  11. E. Çetin, M. A. Yılmaz, and A. M. Tekalp, “Flexible-rate learned hierarchical bi-directional video compression with motion refinement and frame-level bit allocation,” in 2022 IEEE International Conference on Image Processing (ICIP), 2022, pp. 1206–1210.
  12. Z. Cui, J. Wang, S. Gao, T. Guo, Y. Feng, and B. Bai, “Asymmetric gained deep image compression with continuous rate adaptation,” in IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), June 2021, pp. 10 532–10 541.
  13. X. Zhu, H. Hu, S. Lin, and J. Dai, “Deformable convnets v2: More deformable, better results,” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9300–9308, 2018.
  14. J. Li, B. Li, and Y. Lu, “Hybrid spatial-temporal entropy modelling for neural video compression,” in Proceedings of the 30th ACM International Conference on Multimedia, 2022.
  15. D. He, Z. Yang, W. Peng, R. Ma, H. Qin, and Y. Wang, “Elic: Efficient learned image compression with unevenly grouped space-channel contextual adaptive coding,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 5718–5727.
  16. G. Lu, C. Cai, X. Zhang, L. Chen, W. Ouyang, D. Xu, and Z. Gao, “Content adaptive and error propagation aware deep video compression,” in Computer Vision – ECCV 2020, 2020, pp. 456–472.
  17. T. Xue, B. Chen, J. Wu, D. Wei, and W. T. Freeman, “Video enhancement with task-oriented flow,” International Journal of Computer Vision (IJCV), vol. 127, no. 8, pp. 1106–1125, 2019.
  18. A. Mercat, M. Viitanen, and J. Vanne, “Uvg dataset: 50/120fps 4k sequences for video codec analysis and development,” in ACM Multimedia Systems Conference, ser. MMSys ’20, 2020, p. 297–302.
  19. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Int. Conf. Learning Representation (ICLR), 2015.
  20. O. Keleş, M. A. Yilmaz, A. M. Tekalp, C. Korkmaz, and Z. Doğan, “On the computation of psnr for a set of images or video,” 2021 Picture Coding Symposium (PCS), pp. 1–5, 2021.
  21. H. Wang, W. Gan, S. Hu, J. Y. Lin, L. Jin, L. Song, P. Wang, I. Katsavounidis, A. Aaron, and C.-C. J. Kuo, “Mcl-jcv: A jnd-based h.264/avc video quality assessment dataset,” in 2016 IEEE International Conference on Image Processing (ICIP), 2016, pp. 1509–1513.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. M. Akın Yılmaz (56 papers)
  2. O. Ugur Ulas (2 papers)
  3. A. Murat Tekalp (31 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.