Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Uncertainty-Aware Deep Video Compression with Ensembles (2403.19158v1)

Published 28 Mar 2024 in cs.CV and eess.IV

Abstract: Deep learning-based video compression is a challenging task, and many previous state-of-the-art learning-based video codecs use optical flows to exploit the temporal correlation between successive frames and then compress the residual error. Although these two-stage models are end-to-end optimized, the epistemic uncertainty in the motion estimation and the aleatoric uncertainty from the quantization operation lead to errors in the intermediate representations and introduce artifacts in the reconstructed frames. This inherent flaw limits the potential for higher bit rate savings. To address this issue, we propose an uncertainty-aware video compression model that can effectively capture the predictive uncertainty with deep ensembles. Additionally, we introduce an ensemble-aware loss to encourage the diversity among ensemble members and investigate the benefits of incorporating adversarial training in the video compression task. Experimental results on 1080p sequences show that our model can effectively save bits by more than 20% compared to DVC Pro.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Cisco, “Cisco annual internet report (2018-2023) white paper,” 2020. [Online]. Available: https://www.cisco.com/c/en/us/solutions/collateral/executive-perspectives/annual-internet-report/white-paper-c11-741490.html
  2. O. Rippel, S. Nair, C. Lew, S. Branson, A. G. Anderson, and L. Bourdev, “Learned video compression,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 3454–3463.
  3. G. Lu, X. Zhang, W. Ouyang, L. Chen, Z. Gao, and D. Xu, “An end-to-end learning framework for video compression,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–1, 2020.
  4. E. Agustsson, D. Minnen, N. Johnston, J. Balle, S. J. Hwang, and G. Toderici, “Scale-space flow for end-to-end optimized video compression,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
  5. H. Chen, B. He, H. Wang, Y. Ren, S.-N. Lim, and A. Shrivastava, “Nerv: Neural representations for videos s,” in NeurIPS, 2021.
  6. S. Tomar, “Converting video formats with ffmpeg,” Linux Journal, vol. 2006, no. 146, p. 10, 2006.
  7. A. Der Kiureghian and O. Ditlevsen, “Aleatory or epistemic? does it matter?” Structural safety, vol. 31, no. 2, pp. 105–112, 2009.
  8. A. Kendall and Y. Gal, “What uncertainties do we need in bayesian deep learning for computer vision?” arXiv preprint arXiv:1703.04977, 2017.
  9. Y. Gal, “Uncertainty in deep learning,” Ph.D. dissertation, University of Cambridge, 2016.
  10. B. Lakshminarayanan, A. Pritzel, and C. Blundell, “Simple and scalable predictive uncertainty estimation using deep ensembles,” in Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., vol. 30.   Curran Associates, Inc., 2017.
  11. D. J. MacKay, “A practical bayesian framework for backpropagation networks,” Neural computation, vol. 4, no. 3, pp. 448–472, 1992.
  12. G. E. Hinton and R. Neal, “Bayesian learning for neural networks,” 1995.
  13. Y. Gal and Z. Ghahramani, “Dropout as a bayesian approximation: Representing model uncertainty in deep learning,” in international conference on machine learning.   PMLR, 2016, pp. 1050–1059.
  14. D. Nix and A. Weigend, “Estimating the mean and variance of the target probability distribution,” in Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN’94), vol. 1, 1994, pp. 55–60 vol.1.
  15. I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” arXiv preprint arXiv:1412.6572, 2014.
  16. J. Pessoa, H. Aidos, P. Tomás, and M. A. Figueiredo, “End-to-end learning of video compression using spatio-temporal autoencoders,” in 2020 IEEE Workshop on Signal Processing Systems (SiPS).   IEEE, 2020, pp. 1–6.
  17. A. Habibian, T. v. Rozendaal, J. M. Tomczak, and T. S. Cohen, “Video compression with rate-distortion autoencoders,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 7033–7042.
  18. G. Lu, W. Ouyang, D. Xu, X. Zhang, C. Cai, and Z. Gao, “Dvc: An end-to-end deep video compression framework,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 11 006–11 015.
  19. A. Ranjan and M. J. Black, “Optical flow estimation using a spatial pyramid network,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
  20. P. He, H. Li, H. Wang, S. Wang, X. Jiang, and R. Zhang, “Frame-wise detection of double hevc compression by learning deep spatio-temporal representations in compression domain,” IEEE Transactions on Multimedia, vol. 23, pp. 3179–3192, 2021.
  21. F. Luo, S. Wang, S. Wang, X. Zhang, S. Ma, and W. Gao, “Gpu-based hierarchical motion estimation for high efficiency video coding,” IEEE Transactions on Multimedia, vol. 21, no. 4, pp. 851–862, 2019.
  22. M. Lu, T. Chen, Z. Dai, D. Wang, D. Ding, and Z. Ma, “Decoder-side cross resolution synthesis for video compression enhancement,” IEEE Transactions on Multimedia, pp. 1–1, 2022.
  23. S. Wang, X. Zhang, X. Liu, J. Zhang, S. Ma, and W. Gao, “Utility-driven adaptive preprocessing for screen content video compression,” IEEE Transactions on Multimedia, vol. 19, no. 3, pp. 660–667, 2017.
  24. D. Hendrycks and K. Gimpel, “A baseline for detecting misclassified and out-of-distribution examples in neural networks,” arXiv preprint arXiv:1610.02136, 2016.
  25. M. Abbasi and C. Gagné, “Robustness to adversarial examples through an ensemble of specialists,” arXiv preprint arXiv:1702.06856, 2017.
  26. L. Hansen and P. Salamon, “Neural network ensembles,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 12, no. 10, pp. 993–1001, 1990.
  27. D. H. Wolpert, “Stacked generalization,” Neural Networks, vol. 5, no. 2, pp. 241–259, 1992.
  28. M. Perrone and L. Cooper, “When networks disagree: Ensemble methods for hybrid neural networks,” Neural networks for speech and image processing, 08 1993.
  29. A. Krogh and J. Vedelsby, “Neural network ensembles, cross validation, and active learning,” in Advances in Neural Information Processing Systems, G. Tesauro, D. Touretzky, and T. Leen, Eds., vol. 7.   MIT Press, 1995.
  30. S. Lee, S. Purushwalkam, M. Cogswell, D. Crandall, and D. Batra, “Why m heads are better than one: Training a diverse ensemble of deep networks,” 2015.
  31. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1–9.
  32. T. Garipov, P. Izmailov, D. Podoprikhin, D. P. Vetrov, and A. G. Wilson, “Loss surfaces, mode connectivity, and fast ensembling of dnns,” in Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, Eds., vol. 31.   Curran Associates, Inc., 2018.
  33. S. Fort, H. Hu, and B. Lakshminarayanan, “Deep ensembles: A loss landscape perspective,” 2020.
  34. G. Lu, C. Cai, X. Zhang, L. Chen, W. Ouyang, D. Xu, and Z. Gao, “Content adaptive and error propagation aware deep video compression,” in European Conference on Computer Vision.   Springer, 2020, pp. 456–472.
  35. Z. Hu, Z. Chen, D. Xu, G. Lu, W. Ouyang, and S. Gu, “Improving deep video compression by resolution-adaptive flow coding,” in European Conference on Computer Vision.   Springer, 2020, pp. 193–209.
  36. Y. Wang, D. Liu, S. Ma, F. Wu, and W. Gao, “Ensemble learning-based rate-distortion optimization for end-to-end image compression,” IEEE Transactions on Circuits and Systems for Video Technology, 2020.
  37. S. Fort, H. Hu, and B. Lakshminarayanan, “Deep ensembles: A loss landscape perspective,” arXiv preprint arXiv:1912.02757, 2019.
  38. P. Bartlett, Y. Freund, W. S. Lee, and R. E. Schapire, “Boosting the margin: A new explanation for the effectiveness of voting methods,” The annals of statistics, vol. 26, no. 5, pp. 1651–1686, 1998.
  39. C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” arXiv preprint arXiv:1312.6199, 2013.
  40. J. Ballé, V. Laparra, and E. P. Simoncelli, “End-to-end optimization of nonlinear transform codes for perceptual quality,” in 2016 Picture Coding Symposium (PCS).   IEEE, 2016, pp. 1–5.
  41. T. Xue, B. Chen, J. Wu, D. Wei, and W. T. Freeman, “Video enhancement with task-oriented flow,” International Journal of Computer Vision, vol. 127, no. 8, pp. 1106–1125, 2019.
  42. Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale structural similarity for image quality assessment,” in The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, vol. 2.   Ieee, 2003, pp. 1398–1402.
  43. I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017.
  44. G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the high efficiency video coding (hevc) standard,” IEEE Transactions on circuits and systems for video technology, vol. 22, no. 12, pp. 1649–1668, 2012.
  45. A. Mercat, M. Viitanen, and J. Vanne, “Uvg dataset: 50/120fps 4k sequences for video codec analysis and development,” in Proceedings of the 11th ACM Multimedia Systems Conference, 2020, pp. 297–302.
  46. H. Wang, W. Gan, S. Hu, J. Y. Lin, L. Jin, L. Song, P. Wang, I. Katsavounidis, A. Aaron, and C.-C. J. Kuo, “Mcl-jcv: a jnd-based h. 264/avc video quality assessment dataset,” in 2016 IEEE International Conference on Image Processing (ICIP).   IEEE, 2016, pp. 1509–1513.
  47. Z. Cheng, H. Sun, M. Takeuchi, and J. Katto, “Learned image compression with discretized gaussian mixture likelihoods and attention modules,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
  48. G. Bjøntegaard, “Calculation of average psnr differences between rd-curves,” 2001.
  49. E. Wong, L. Rice, and J. Z. Kolter, “Fast is better than free: Revisiting adversarial training,” arXiv preprint arXiv:2001.03994, 2020.
Citations (4)

Summary

We haven't generated a summary for this paper yet.