Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learned Video Compression via Heterogeneous Deformable Compensation Network (2207.04589v3)

Published 11 Jul 2022 in eess.IV, cs.CV, and cs.MM

Abstract: Learned video compression has recently emerged as an essential research topic in developing advanced video compression technologies, where motion compensation is considered one of the most challenging issues. In this paper, we propose a learned video compression framework via heterogeneous deformable compensation strategy (HDCVC) to tackle the problems of unstable compression performance caused by single-size deformable kernels in downsampled feature domain. More specifically, instead of utilizing optical flow warping or single-size-kernel deformable alignment, the proposed algorithm extracts features from the two adjacent frames to estimate content-adaptive heterogeneous deformable (HetDeform) kernel offsets. Then we transform the reference features with the HetDeform convolution to accomplish motion compensation. Moreover, we design a Spatial-Neighborhood-Conditioned Divisive Normalization (SNCDN) to achieve more effective data Gaussianization combined with the Generalized Divisive Normalization. Furthermore, we propose a multi-frame enhanced reconstruction module for exploiting context and temporal information for final quality enhancement. Experimental results indicate that HDCVC achieves superior performance than the recent state-of-the-art learned video compression approaches.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, “Overview of the h. 264/AVC video coding standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 560–576, 2003.
  2. G. J. Sullivan, J. Ohm, W. Han, and T. Wiegand, “Overview of the high efficiency video coding (HEVC) standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, pp. 1649–1668, 2012.
  3. N. Bakir, W. Hamidouche, S. A. Fezza, K. Samrouth, and O. Déforges, “Light field image coding using vvc standard and view synthesis based on dual discriminator gan,” IEEE Transactions on Multimedia, vol. 23, pp. 2972–2985, 2021.
  4. P. He, H. Li, H. Wang, S. Wang, X. Jiang, and R. Zhang, “Frame-wise detection of double hevc compression by learning deep spatio-temporal representations in compression domain,” IEEE Transactions on Multimedia, vol. 23, pp. 3179–3192, 2021.
  5. C.-Y. Wu, N. Singhal, and P. Krahenbuhl, “Video compression through image interpolation,” in Proceedings of the European Conference on Computer Vision, September 2018.
  6. G. Lu, W. Ouyang, D. Xu, X. Zhang, C. Cai, and Z. Gao, “DVC: An end-to-end deep video compression framework,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 2019.
  7. R. Yang, F. Mentzer, L. V. Gool, and R. Timofte, “Learning for video compression with hierarchical quality and recurrent enhancement,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 6628–6637.
  8. O. Rippel, S. Nair, C. Lew, S. Branson, A. G. Anderson, and L. Bourdev, “Learned video compression,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 3454–3463.
  9. H. Liu, H. Shen, L. Huang, M. Lu, T. Chen, and Z. Ma, “Learned video compression via joint spatial-temporal correlation exploration,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2020, pp. 11 580–11 587.
  10. Y. Tian, Y. Zhang, Y. Fu, and C. Xu, “Tdan: Temporally-deformable alignment network for video super-resolution,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 3360–3369.
  11. Z. Hu, G. Lu, and D. Xu, “Fvc: A new framework towards deep video compression in feature space,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 1502–1511.
  12. P. Singh, V. K. Verma, P. Rai, and V. P. Namboodiri, “Hetconv: Heterogeneous kernel-based convolutions for deep cnns,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 4835–4844.
  13. J. Ballé, V. Laparra, and E. P. Simoncelli, “End-to-end optimized image compression,” in International Conference on Learning Representations, 2017.
  14. S. Lyu and E. P. Simoncelli, “Nonlinear image representation using divisive normalization,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.   IEEE, 2008, pp. 1–8.
  15. J. Ballé, D. Minnen, S. Singh, S. J. Hwang, and N. Johnston, “Variational image compression with a scale hyperprior,” in International Conference on Learning Representations, 2018.
  16. D. Minnen, J. Ballé, and G. Toderici, “Joint autoregressive and hierarchical priors for learned image compression,” in Advances in Neural Information Processing Systems, 2018.
  17. J. Lee, S. Cho, and S.-K. Beack, “Context-adaptive entropy model for end-to-end optimized image compression,” in International Conference on Learning Representations, 2018.
  18. Z. Cheng, H. Sun, M. Takeuchi, and J. Katto, “Learned image compression with discretized gaussian mixture likelihoods and attention modules,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 7939–7948.
  19. Z. Cheng, H. Sun, M. Takeuchi, and J. Katto, “Energy compaction-based image compression using convolutional autoencoder,” IEEE Transactions on Multimedia, vol. 22, no. 4, pp. 860–873, 2019.
  20. M. Akbari, J. Liang, J. Han, and C. Tu, “Learned multi-resolution variable-rate image compression with octave-based residual blocks,” IEEE Transactions on Multimedia, vol. 23, pp. 3013–3021, 2021.
  21. H. Ma, D. Liu, R. Xiong, and F. Wu, “iwave: Cnn-based wavelet-like transform for image compression,” IEEE Transactions on Multimedia, vol. 22, no. 7, pp. 1667–1679, 2020.
  22. G. K. Wallace, “The JPEG still picture compression standard,” IEEE transactions on consumer electronics, vol. 38, no. 1, pp. xviii–xxxiv, 1992.
  23. A. Skodras, C. Christopoulos, and T. Ebrahimi, “The jpeg 2000 still image compression standard,” IEEE Signal Processing Magazine, vol. 18, no. 5, pp. 36–58, 2001.
  24. F. Bellard, “BPG image format,” 2014, https://bellard.org/bpg Accessed: March 1, 2022.
  25. B. Bross, J. Chen, J.-R. Ohm, G. J. Sullivan, and Y.-K. Wang, “Developments in international video coding standardization after avc, with an overview of versatile video coding (vvc),” Proceedings of the IEEE, vol. 109, no. 9, pp. 1463–1493, 2021.
  26. “x264, the free H.264/AVC encoder,” https://www.videolan.org/developers/x264.html Accessed: March 1, 2022.
  27. A. Habibian, T. v. Rozendaal, J. M. Tomczak, and T. S. Cohen, “Video compression with rate-distortion autoencoders,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 7033–7042.
  28. G. Lu, X. Zhang, W. Ouyang, L. Chen, Z. Gao, and D. Xu, “An end-to-end learning framework for video compression,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020.
  29. Z. Hu, Z. Chen, D. Xu, G. Lu, W. Ouyang, and S. Gu, “Improving deep video compression by resolution-adaptive flow coding,” in Proceedings of the European Conference on Computer Vision.   Springer, 2020, pp. 193–209.
  30. H. Liu, M. Lu, Z. Chen, X. Cao, Z. Ma, and Y. Wang, “End-to-end neural video coding using a compound spatiotemporal representation,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 8, pp. 5650–5662, 2022.
  31. A. Ranjan and M. J. Black, “Optical flow estimation using a spatial pyramid network,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 4161–4170.
  32. E. Agustsson, D. Minnen, N. Johnston, J. Balle, S. J. Hwang, and G. Toderici, “Scale-space flow for end-to-end optimized video compression,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 8503–8512.
  33. J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei, “Deformable convolutional networks,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 764–773.
  34. X. Wang, K. C. Chan, K. Yu, C. Dong, and C. Change Loy, “EDVR: Video restoration with enhanced deformable convolutional networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019.
  35. J. Ballé, V. Laparra, and E. P. Simoncelli, “Density modeling of images using a generalized normalization transformation,” in International Conference on Learning Representations, 2016.
  36. M. Carandini and D. J. Heeger, “Normalization as a canonical neural computation,” Nature Reviews Neuroscience, vol. 13, no. 1, pp. 51–62, 2012.
  37. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
  38. J. Li, B. Li, and Y. Lu, “Deep contextual video compression,” Advances in Neural Information Processing Systems, vol. 34, pp. 18 114–18 125, 2021.
  39. Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale structural similarity for image quality assessment,” in Asilomar Conference on Signals, Systems & Computers, vol. 2.   Ieee, 2003, pp. 1398–1402.
  40. T. Xue, B. Chen, J. Wu, D. Wei, and W. T. Freeman, “Video enhancement with task-oriented flow,” International Journal of Computer Vision, vol. 127, no. 8, pp. 1106–1125, 2019.
  41. F. Bossen, “Common test conditions and software reference configurations,” JCTVC-L1100, vol. 12, 2013.
  42. M. V. A. Mercat and J. Vanne, “UVG dataset: 50/120fps 4K sequences for video codec analysis and development,” in Proceedings of the ACM Multimedia Systems Conference, 2020.
  43. H. Wang, W. Gan, S. Hu, J. Y. Lin, L. Jin, L. Song, P. Wang, I. Katsavounidis, A. Aaron, and C.-C. J. Kuo, “MCL-JCV: a jnd-based h. 264/avc video quality assessment dataset,” in Proceedings of the IEEE International Conference on Image Processing.   IEEE, 2016, pp. 1509–1513.
  44. J. Bégaint, F. Racapé, S. Feltman, and A. Pushparaja, “Compressai: a pytorch library and evaluation platform for end-to-end compression research,” arXiv preprint arXiv:2011.03029, 2020.
  45. “x265, the free H.265/HEVC encoder,” https://www.videolan.org/developers/x265.html Accessed: March 1, 2022.
  46. R. Yang, F. Mentzer, L. Van Gool, and R. Timofte, “Learning for video compression with recurrent auto-encoder and recurrent probability model,” IEEE Journal of Selected Topics in Signal Processing, vol. 15, no. 2, pp. 388–401, 2021.
  47. G. Bjontegaard, “Calculation of average PSNR differences between RD-curves,” VCEG-M33, 2001.
  48. G. Lu, C. Cai, X. Zhang, L. Chen, W. Ouyang, D. Xu, and Z. Gao, “Content adaptive and error propagation aware deep video compression,” in Proceedings of the European Conference on Computer Vision.   Springer, 2020, pp. 456–472.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Huairui Wang (5 papers)
  2. Zhenzhong Chen (61 papers)
  3. Chang Wen Chen (58 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.