Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

One-Click Upgrade from 2D to 3D: Sandwiched RGB-D Video Compression for Stereoscopic Teleconferencing (2404.09979v1)

Published 15 Apr 2024 in cs.CV and eess.IV

Abstract: Stereoscopic video conferencing is still challenging due to the need to compress stereo RGB-D video in real-time. Though hardware implementations of standard video codecs such as H.264 / AVC and HEVC are widely available, they are not designed for stereoscopic videos and suffer from reduced quality and performance. Specific multiview or 3D extensions of these codecs are complex and lack efficient implementations. In this paper, we propose a new approach to upgrade a 2D video codec to support stereo RGB-D video compression, by wrapping it with a neural pre- and post-processor pair. The neural networks are end-to-end trained with an image codec proxy, and shown to work with a more sophisticated video codec. We also propose a geometry-aware loss function to improve rendering quality. We train the neural pre- and post-processors on a synthetic 4D people dataset, and evaluate it on both synthetic and real-captured stereo RGB-D videos. Experimental results show that the neural networks generalize well to unseen data and work out-of-box with various video codecs. Our approach saves about 30% bit-rate compared to a conventional video coding scheme and MV-HEVC at the same level of rendering quality from a novel view, without the need of a task-specific hardware upgrade.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. Nonlinear transform coding. IEEE Journal of Selected Topics in Signal Processing, 15(2):339–353, 2020.
  2. Gisle Bjontegaard. Calculation of average psnr differences between rd-curves. ITU SG16 Doc. VCEG-M33, 2001.
  3. Exploiting intra-slice and inter-slice redundancy for learning-based lossless volumetric image compression. TIP, 31:1697–1707, 2022a.
  4. Lsvc: a learning-based stereo video compression framework. In CVPR, pages 6073–6082, 2022b.
  5. TensorFlow Graphics Developers. Tensorflow graphics, 2023.
  6. Sandwiched image compression: wrapping neural networks around a standard codec. In ICIP, pages 3757–3761. IEEE, 2021.
  7. Sandwiched image compression: Increasing the resolution and dynamic range of standard codecs. In 2022 Picture Coding Symposium (PCS), pages 175–179. IEEE, 2022.
  8. The relightables: Volumetric performance capture of humans with realistic relighting. TOG, 38(6):1–19, 2019.
  9. Learning end-to-end lossy image compression: A benchmark. PAMI, 44(8):4194–4211, 2021.
  10. Improving deep video compression by resolution-adaptive flow coding. In ECCV, pages 193–209. Springer, 2020.
  11. Sandwiched video compression: Efficiently extending the reach of standard codecs with neural wrappers. arXiv preprint arXiv:2303.11473, 2023.
  12. Achieving eye contact in a one-to-many 3d video teleconferencing system. TOG, 28(3):1–8, 2009.
  13. Deep stereo image compression via bi-directional coding. In CVPR, pages 19669–19678, 2022.
  14. Hybrid spatial-temporal entropy modelling for neural video compression. In ACMMM, pages 1503–1511, 2022.
  15. Deep learning-based video coding: A review and a case study. ACM Computing Surveys (CSUR), 53(1):1–35, 2020.
  16. Dsic: Deep stereo image compression. In ICCV, pages 3136–3145, 2019.
  17. Hybrid lossless-lossy compression for real-time depth-sensor streams in 3d telepresence applications. In Advances in Multimedia Information Processing–PCM 2015: 16th Pacific-Rim Conference on Multimedia, Gwangju, South Korea, September 16-18, 2015, Proceedings, Part I 16, pages 442–452. Springer, 2015.
  18. Dvc: An end-to-end deep video compression framework. In CVPR, pages 11006–11015, 2019.
  19. M Lukacs. Predictive coding of multi-viewpoint image sets. In ICASSP’86. IEEE International Conference on Acoustics, Speech, and Signal Processing, pages 521–524. IEEE, 1986.
  20. Enhanced personal autostereoscopic telepresence system using commodity depth cameras. Computers & Graphics, 36(7):791–807, 2012.
  21. Michael G Perkins. Data compression of stereopairs. IEEE Transactions on communications, 40(4):684–696, 1992.
  22. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241. Springer, 2015.
  23. Overview of the high efficiency video coding (hevc) standard. CSVT, (12):1649–1668, 2012.
  24. Overview of the multiview and 3d extensions of high efficiency video coding. CSVT, 26(1):35–49, 2015.
  25. Rgb and depth intra-frame cross-compression for low bandwidth 3d video. In Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012), pages 955–958. IEEE, 2012.
  26. Overview of the stereo and multiview video coding extensions of the h. 264/mpeg-4 avc standard. Proceedings of the IEEE, 99(4):626–642, 2011.
  27. Neural face video compression using multiple views. In CVPRW, pages 1738–1742, 2022.
  28. Overview of the h. 264/avc video coding standard. CSVT, 13(7):560–576, 2003.
  29. Sasic: Stereo image compression with latent shifts and stereo attention. In CVPR, pages 661–670, 2022.
  30. Viewport: A distributed, immersive teleconferencing system with infrared dot pattern. IEEE MultiMedia, 20(1):17–27, 2013.
  31. The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, pages 586–595, 2018.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Yueyu Hu (19 papers)
  2. Onur G. Guleryuz (5 papers)
  3. Philip A. Chou (20 papers)
  4. Danhang Tang (19 papers)
  5. Jonathan Taylor (62 papers)
  6. Rus Maxham (1 paper)
  7. Yao Wang (331 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.