Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Low-Latency Neural Stereo Streaming (2403.17879v1)

Published 26 Mar 2024 in cs.CV and eess.IV

Abstract: The rise of new video modalities like virtual reality or autonomous driving has increased the demand for efficient multi-view video compression methods, both in terms of rate-distortion (R-D) performance and in terms of delay and runtime. While most recent stereo video compression approaches have shown promising performance, they compress left and right views sequentially, leading to poor parallelization and runtime performance. This work presents Low-Latency neural codec for Stereo video Streaming (LLSS), a novel parallel stereo video coding method designed for fast and efficient low-latency stereo video streaming. Instead of using a sequential cross-view motion compensation like existing methods, LLSS introduces a bidirectional feature shifting module to directly exploit mutual information among views and encode them effectively with a joint cross-view prior model for entropy coding. Thanks to this design, LLSS processes left and right views in parallel, minimizing latency; all while substantially improving R-D performance compared to both existing neural and conventional codecs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (67)
  1. Generative adversarial networks for extreme learned image compression. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 221–231, 2019.
  2. Scale-space flow for end-to-end optimized video compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8503–8512, 2020.
  3. Multi-realism image compression with a conditional generator. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22324–22333, 2023.
  4. Deepspeed inference: Enabling efficient inference of transformer models at unprecedented scale. arXiv preprint arXiv:2207.00032, 2022.
  5. Variational image compression with a scale hyperprior. In International Conference on Learning Representations, 2018.
  6. Gisle Bjontegaard. Calculation of average psnr differences between rd-curves. ITU SG16 Doc. VCEG-M33, 2001.
  7. Overview of the versatile video coding (vvc) standard and its applications. IEEE Transactions on Circuits and Systems for Video Technology, 31(10):3736–3764, 2021.
  8. End-to-end optimized roi image compression. IEEE Transactions on Image Processing, 29:3442–3457, 2019.
  9. Pyramid stereo matching network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5410–5418, 2018.
  10. LSVC: A Learning-Based stereo video compression framework. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6073–6082, 2022.
  11. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3213–3223, 2016.
  12. Elements of Information Theory. Wiley-Interscience, 2006.
  13. DeepSpeed. Deepspeed flops profiler. https://github.com/microsoft/DeepSpeed/tree/master/deepspeed/profiling/flops_profiler, 2023. [Online; accessed 1-May-2023].
  14. Deep homography for efficient stereo image compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1492–1501, 2021.
  15. A neural video codec with spatial rate-distortion control. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 5365–5374, 2023.
  16. Meta Quest for Creators. Encoding immersive videos for meta quest 2. https://creator.oculus.com/getting-started/media-production-specifications-for-delivery-to-meta-quest-2-headsets/, 2022. Accessed: 2023-11-16.
  17. Are we ready for autonomous driving? the KITTI vision benchmark suite. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 3354–3361, 2012.
  18. A residual diffusion model for high perceptual quality codec augmentation. arXiv preprint arXiv:2301.05489, 2023.
  19. Feedback recurrent autoencoder for video compression. ACCV, 2020.
  20. Group-wise correlation stereo network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3273–3282, 2019.
  21. Video compression with rate-distortion autoencoders. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7033–7042, 2019.
  22. Elic: Efficient learned image compression with unevenly grouped space-channel contextual adaptive coding. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  23. HEVC. Hevc test model (hm). https://hevc.hhi.fraunhofer.de/HM-doc/, 2023a. [Online; accessed 19-Apr-2023].
  24. MV HEVC. Multiview high efficiency video coding (mv-hevc). https://hevc.hhi.fraunhofer.de/mvhevc, 2023b. [Online; accessed 19-Apr-2023].
  25. FVC: A new framework towards deep video compression in feature space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1502–1511, 2021.
  26. Coarse-to-fine deep video coding with hyperprior-guided mode prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5921–5930, 2022.
  27. Block partitioning structure in the hevc standard. IEEE Transactions on Circuits and Systems for Video Technology, 22(12):1697–1706, 2012.
  28. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  29. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
  30. Conditional coding for flexible learned video compression. ICLR neural compression workshop, 2021.
  31. Gamecodec: Neural cloud gaming video codec. In BMVC, page 204, 2022a.
  32. Mobilecodec: neural inter-frame video compression on mobile devices. In Proceedings of the 13th ACM Multimedia Systems Conference, pages 324–330, 2022b.
  33. Deep stereo image compression via Bi-Directional coding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19669–19678, 2022.
  34. Deep contextual video compression. Advances in Neural Information Processing Systems, 34, 2021.
  35. Hybrid spatial-temporal entropy modelling for neural video compression. In Proceedings of the 30th ACM International Conference on Multimedia, 2022.
  36. Neural video compression with diverse contexts. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, Canada, June 18-22, 2023, 2023.
  37. DSIC: Deep stereo image compression. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3136–3145, 2019.
  38. Dvc: An end-to-end deep video compression framework. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11006–11015, 2019.
  39. M. Lukacs. Predictive coding of multi-viewpoint image sets. In ICASSP ’86. IEEE International Conference on Acoustics, Speech, and Signal Processing, pages 521–524, 1986.
  40. High-fidelity generative image compression. Advances in Neural Information Processing Systems, 33:11913–11924, 2020.
  41. Object scene flow for autonomous vehicles. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3061–3070. IEEE, 2015.
  42. Joint autoregressive and hierarchical priors for learned image compression. Advances in neural information processing systems, 31, 2018.
  43. Improving statistical fidelity for neural image compression with implicit local likelihood models. arXiv preprint arXiv:2301.11189, 2023.
  44. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, pages 8024–8035. Curran Associates, Inc., 2019.
  45. M.G. Perkins. Data compression of stereopairs. IEEE Transactions on Communications, 40(4):684–696, 1992.
  46. Extending neural p-frame codecs for b-frame coding. Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6680–6689, 2021.
  47. Boosting neural video codecs by exploiting hierarchical redundancy. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 5355–5364, 2023.
  48. Learned video compression. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019.
  49. Elf-vc: Efficient learned flexible-rate video coding. Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14479–14488, 2021.
  50. Pcw-net: Pyramid combination and warping cost volume for stereo matching. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXII, pages 280–297. Springer, 2022.
  51. Temporal context mining for learned video compression. IEEE Transactions on Multimedia, 2022.
  52. Implicit neural representations for image compression. arXiv preprint arXiv:2112.04267, 2021.
  53. Overview of the high efficiency video coding (hevc) standard. IEEE Transactions on Circuits and Systems for Video Technology, 22(12):1649–1668, 2012.
  54. Overview of the multiview and 3d extensions of high efficiency video coding. IEEE Transactions on Circuits and Systems for Video Technology, 26(1):35–49, 2016.
  55. Lossy image compression with compressive autoencoders. ICLR, 2017.
  56. Full resolution image compression with recurrent neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 5306–5314, 2017.
  57. Instance-adaptive video compression: Improving neural codecs by training on the test set. arXiv preprint arXiv:2111.10302, 2021a.
  58. Overfitting for fun and profit: Instance-adaptive data compression. arXiv preprint arXiv:2101.08687, 2021b.
  59. Overview of the stereo and multiview video coding extensions of the h.264/mpeg-4 avc standard. Proceedings of the IEEE, 99(4):626–642, 2011.
  60. Multiscale structural similarity for image quality assessment. In The Thrity-Seventh Asilomar Conference on Signals, Systems and Computers, 2003, pages 1398–1402 Vol.2, 2003.
  61. Overview of the h.264/avc video coding standard. IEEE Transactions on Circuits and Systems for Video Technology, 13(7):560–576, 2003a.
  62. Overview of the h. 264/avc video coding standard. IEEE Transactions on circuits and systems for video technology, 13(7):560–576, 2003b.
  63. SASIC: Stereo image compression with latent shifts and stereo attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 661–670, 2022.
  64. Video compression through image interpolation. Proceedings of the European conference on computer vision (ECCV), pages 416–431, 2018.
  65. Video enhancement with task-oriented flow. International Journal of Computer Vision, 127:1106–1125, 2019.
  66. Perceptual learned video compression with recurrent conditional GAN. arXiv preprint arXiv:2109.03082, 2021.
  67. Implicit neural video compression. arXiv preprint arXiv:2112.11312, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Qiqi Hou (11 papers)
  2. Farzad Farhadzadeh (7 papers)
  3. Amir Said (18 papers)
  4. Hoang Le (14 papers)
  5. Guillaume Sautiere (10 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com