Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Neural Video Compression with Feature Modulation (2402.17414v2)

Published 27 Feb 2024 in cs.CV and eess.IV

Abstract: The emerging conditional coding-based neural video codec (NVC) shows superiority over commonly-used residual coding-based codec and the latest NVC already claims to outperform the best traditional codec. However, there still exist critical problems blocking the practicality of NVC. In this paper, we propose a powerful conditional coding-based NVC that solves two critical problems via feature modulation. The first is how to support a wide quality range in a single model. Previous NVC with this capability only supports about 3.8 dB PSNR range on average. To tackle this limitation, we modulate the latent feature of the current frame via the learnable quantization scaler. During the training, we specially design the uniform quantization parameter sampling mechanism to improve the harmonization of encoding and quantization. This results in a better learning of the quantization scaler and helps our NVC support about 11.4 dB PSNR range. The second is how to make NVC still work under a long prediction chain. We expose that the previous SOTA NVC has an obvious quality degradation problem when using a large intra-period setting. To this end, we propose modulating the temporal feature with a periodically refreshing mechanism to boost the quality. %Besides solving the above two problems, we also design a single model that can support both RGB and YUV colorspaces. Notably, under single intra-frame setting, our codec can achieve 29.7\% bitrate saving over previous SOTA NVC with 16\% MACs reduction. Our codec serves as a notable landmark in the journey of NVC evolution. The codes are at https://github.com/microsoft/DCVC.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. ECM. https://vcgit.hhi.fraunhofer.de/ecm/ECM.
  2. HM. https://vcgit.hhi.fraunhofer.de/jvet/HM/.
  3. VTM. https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM/.
  4. Original vimeo links. https://github.com/anchen1011/toflow/blob/master/data/original_vimeo_links.txt.
  5. Scale-space flow for end-to-end optimized video compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8503–8512, 2020.
  6. [AHG 11] Brief information about JPEG AI CfP status. In JVET-AA0047, 2022.
  7. Anchors · JPEG-AI MMSP Challenge. Anchors · JPEG-AI MMSP Challenge. https://jpegai.github.io/7-anchors/.
  8. The JPEG AI Standard: Providing Efficient Human and Machine Visual Data Consumption. IEEE Multimedia, 30(1):100–111, 2023.
  9. Variational image compression with a scale hyperprior. 6th International Conference on Learning Representations, ICLR, 2018.
  10. Gisle Bjontegaard. Calculation of average PSNR differences between RD-curves. VCEG-M33, 2001.
  11. Frank Bossen et al. Common test conditions and software reference configurations. In JCTVC-L1100, 2013.
  12. Overview of the versatile video coding (VVC) standard and its applications. IEEE Transactions on Circuits and Systems for Video Technology, 31(10):3736–3764, 2021.
  13. François Chollet. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1251–1258, 2017.
  14. Neural inter-frame compression for video coding. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019.
  15. Neural image compression with a diffusion-based decoder. arXiv preprint arXiv:2301.05489, 2023.
  16. Canf-vc: Conditional augmented normalizing flows for video compression. European Conference on Computer Vision, 2022.
  17. Improving deep video compression by resolution-adaptive flow coding. In European Conference on Computer Vision, pages 193–209. Springer, 2020.
  18. FVC: A new framework towards deep video compression in feature space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1502–1511, 2021.
  19. Coarse-to-fine deep video coding with hyperprior-guided mode prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5921–5930, 2022.
  20. Neural compression-based feature learning for video restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5872–5881, 2022.
  21. Contextformer: A transformer with spatio-channel attention for context modeling in learned image compression. arXiv preprint arXiv:2203.02452, 2022.
  22. Optical flow and mode selection for learning-based video coding. In 22nd IEEE International Workshop on Multimedia Signal Processing, 2020.
  23. Conditional coding for flexible learned video compression. In Neural Compression: From Information Theory to Applications – Workshop @ ICLR, 2021.
  24. Deep contextual video compression. Advances in Neural Information Processing Systems, 34, 2021.
  25. Hybrid spatial-temporal entropy modelling for neural video compression. In Proceedings of the 30th ACM International Conference on Multimedia, pages 1503–1511, 2022.
  26. Neural video compression with diverse contexts. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, Canada, June 18-22, 2023, 2023.
  27. M-LVC: multiple frames prediction for learned video compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.
  28. Mmvc: Learned multi-mode video compression with block-based prediction mode selection and density-adaptive entropy coding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18487–18496, 2023a.
  29. Neural video coding using multiscale motion compensation and spatiotemporal context model. IEEE Transactions on Circuits and Systems for Video Technology, 2020a.
  30. Conditional entropy coding for efficient video compression. In European Conference on Computer Vision, pages 453–468. Springer, 2020b.
  31. Learned image compression with mixed transformer-cnn architectures. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14388–14397, 2023b.
  32. DVC: an end-to-end deep video compression framework. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11006–11015, 2019.
  33. Content adaptive and error propagation aware deep video compression. In European Conference on Computer Vision, pages 456–472. Springer, 2020a.
  34. An end-to-end learning framework for video compression. IEEE transactions on pattern analysis and machine intelligence, 2020b.
  35. Optimizing depthwise separable convolution operations on gpus. IEEE Transactions on Parallel and Distributed Systems, 33(1):70–87, 2021.
  36. Uncertainty-Aware Deep Video Compression with Ensembles. IEEE Transactions on Multimedia, 2024.
  37. Vct: A video compression transformer. arXiv preprint arXiv:2206.07307, 2022.
  38. UVG dataset: 50/120fps 4k sequences for video codec analysis and development. In Proceedings of the 11th ACM Multimedia Systems Conference, pages 297–302, 2020.
  39. Extreme generative image compression by learning text embedding from diffusion models. arXiv preprint arXiv:2211.07793, 2022.
  40. Motion information propagation for neural video compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6111–6120, 2023.
  41. Entroformer: A transformer-based entropy model for learned image compression. arXiv preprint arXiv:2202.05492, 2022.
  42. ELF-VC: Efficient learned flexible-rate video coding. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 14479–14488, 2021.
  43. JVET AHG report: ECM software development (AHG6). In JVET-AF0006, 2023.
  44. Temporal Context Mining for Learned Video Compression. IEEE Transactions on Multimedia, 2022.
  45. Meeting report of the fourth meeting of the joint collaborative team on video coding (jct-vc), daegu, kr, 20–28 january 2011. Document JCTVC-D500, Daegu, KR, 2011.
  46. Lossy compression with gaussian diffusion. arXiv preprint arXiv:2206.08889, 2022.
  47. EVC: Towards Real-Time Neural Image Compression with Mask Decay. In International Conference on Learning Representations, 2023.
  48. MCL-JCV: a JND-based H. 264/AVC video quality assessment dataset. In 2016 IEEE International Conference on Image Processing (ICIP), pages 1509–1513. IEEE, 2016.
  49. Video enhancement with task-oriented flow. International Journal of Computer Vision (IJCV), 127(8):1106–1125, 2019.
  50. Lossy image compression with conditional diffusion models. arXiv preprint arXiv:2209.06950, 2022.
  51. Computationally-efficient neural image compression with shallow decoders. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 530–540, 2023.
  52. The devil is in the details: Window-based attention for image compression. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 17492–17501, 2022.
Citations (28)

Summary

We haven't generated a summary for this paper yet.