Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LiveVV: Human-Centered Live Volumetric Video Streaming System (2310.08205v2)

Published 12 Oct 2023 in cs.MM and cs.HC

Abstract: Volumetric video has emerged as a prominent medium within the realm of eXtended Reality (XR) with the advancements in computer graphics and depth capture hardware. Users can fully immersive themselves in volumetric video with the ability to switch their viewport in six degree-of-freedom (DOF), including three rotational dimensions (yaw, pitch, roll) and three translational dimensions (X, Y, Z). Different from traditional 2D videos that are composed of pixel matrices, volumetric videos employ point clouds, meshes, or voxels to represent a volumetric scene, resulting in significantly larger data sizes. While previous works have successfully achieved volumetric video streaming in video-on-demand scenarios, the live streaming of volumetric video remains an unresolved challenge due to the limited network bandwidth and stringent latency constraints. In this paper, we for the first time propose a holistic live volumetric video streaming system, LiveVV, which achieves multi-view capture, scene segmentation & reuse, adaptive transmission, and rendering. LiveVV contains multiple lightweight volumetric video capture modules that are capable of being deployed without prior preparation. To reduce bandwidth consumption, LiveVV processes static and dynamic volumetric content separately by reusing static data with low disparity and decimating data with low visual saliency. Besides, to deal with network fluctuation, LiveVV integrates a volumetric video adaptive bitrate streaming algorithm (VABR) to enable fluent playback with the maximum quality of experience. Extensive real-world experiment shows that LiveVV can achieve live volumetric video streaming at a frame rate of 24 fps with a latency of less than 350ms.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. Quad-mesh generation and processing: A survey. In Computer graphics forum, vol. 32, pp. 51–76. Wiley Online Library, 2013.
  2. S. Bu and S. Lee. Easy to calibrate: Marker-less calibration of multiview azure kinect. CMES-Computer Modeling in Engineering & Sciences, 136(3), 2023.
  3. M. Chen. Leveraging the asymmetric sensitivity of eye contact for videoconference. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’02, p. 49–56. Association for Computing Machinery, New York, NY, USA, 2002. doi: 10 . 1145/503376 . 503386
  4. Plane detection in 3d point cloud using octree-balanced density down-sampling and iterative adaptive plane extraction. IET Image Processing, 12(9):1595–1605, 2018.
  5. An overview of ongoing point cloud compression standardization activities: Video-based (v-pcc) and geometry-based (g-pcc). APSIPA Transactions on Signal and Information Processing, 9:e13, 2020.
  6. Deep learning for 3d point clouds: A survey. IEEE transactions on pattern analysis and machine intelligence, 43(12):4338–4364, 2020.
  7. ViVo: Visibility-aware mobile volumetric video streaming. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking, MobiCom ’20. Association for Computing Machinery, New York, NY, USA, 2020. doi: 10 . 1145/3372224 . 3380888
  8. FSVVD: A dataset of full scene volumetric video. In Proceedings of the 14th Conference on ACM Multimedia Systems, MMSys ’23, p. 410–415. Association for Computing Machinery, New York, NY, USA, 2023. doi: 10 . 1145/3587819 . 3592551
  9. Understanding user behavior in volumetric video watching: Dataset, analysis and prediction. arXiv preprint arXiv:2308.07578, 2023.
  10. Dynamic 3d avatar creation from hand-held video input. ACM Trans. Graph., 34(4), jul 2015. doi: 10 . 1145/2766974
  11. Achieving eye contact in a one-to-many 3d video teleconferencing system. ACM Trans. Graph., 28(3), jul 2009. doi: 10 . 1145/1531326 . 1531370
  12. Livescan3D: A fast and inexpensive 3d data acquisition system for multiple kinect v2 sensors. In 2015 International Conference on 3D Vision, pp. 318–325. IEEE. doi: 10 . 1109/3DV . 2015 . 43
  13. P. Lambert. Volumetric video market by volumetric capture (hardware, software, service), content delivery, application (sports,events, entertainment, medical, education, training, signage and advertisement) and region - global forecast to 2028. Technical report, MarketsAndMarkets, 5 2023.
  14. Farfetchfusion: Towards fully mobile live 3d telepresence platform. In Proceedings of the 29th Annual International Conference on Mobile Computing and Networking, ACM MobiCom ’23. Association for Computing Machinery, New York, NY, USA, 2023. doi: 10 . 1145/3570361 . 3592525
  15. Cav3: Cache-assisted viewport adaptive volumetric video streaming. In 2023 IEEE Conference Virtual Reality and 3D User Interfaces (VR), pp. 173–183, 2023. doi: 10 . 1109/VR55154 . 2023 . 00033
  16. Fov-aware edge caching for adaptive 360 video streaming. In Proceedings of the 26th ACM international conference on Multimedia, pp. 173–181, 2018.
  17. Holoportation: Virtual 3d teleportation in real-time. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology, UIST ’16, p. 741–754. Association for Computing Machinery, New York, NY, USA, 2016. doi: 10 . 1145/2984511 . 2984517
  18. S. J. Owen. A survey of unstructured mesh generation technology. IMR, 239(267):15, 1998.
  19. Y. Pan and A. Steed. A gaze-preserving situated multiview telepresence system. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’14, p. 2173–2176. Association for Computing Machinery, New York, NY, USA, 2014. doi: 10 . 1145/2556288 . 2557320
  20. Y. Pan and A. Steed. Effects of 3d perspective on head gaze estimation with a multiview autostereoscopic display. Int. J. Hum.-Comput. Stud., 86(C):138–148, feb 2016. doi: 10 . 1016/j . ijhcs . 2015 . 10 . 004
  21. Optimizing 360 video delivery over cellular networks. In Proceedings of the 5th Workshop on All Things Cellular: Operations, Applications and Challenges, pp. 1–6, 2016.
  22. Cwipc-sxr: Point cloud dynamic human dataset for social xr. In Proceedings of the 12th ACM Multimedia Systems Conference, MMSys ’21, p. 300–306. Association for Computing Machinery, New York, NY, USA, 2021. doi: 10 . 1145/3458305 . 3478452
  23. Orb: An efficient alternative to sift or surf. In 2011 International conference on computer vision, pp. 2564–2571. Ieee, 2011.
  24. Color Atlas and Synopsis of Clinical Ophthalmology–Wills Eye Institute–Neuro-Ophthalmology. Lippincott Williams & Wilkins, 2012.
  25. Capture and 3d video processing of volumetric video. In 2019 IEEE International conference on image processing (ICIP), pp. 4310–4314. IEEE, 2019.
  26. Eye-tracking for avatar eye-gaze and interactional analysis in immersive collaborative virtual environments. In Proceedings of the 2008 ACM Conference on Computer Supported Cooperative Work, CSCW ’08, p. 197–200. Association for Computing Machinery, New York, NY, USA, 2008. doi: 10 . 1145/1460563 . 1460593
  27. Essentials of interactive computer graphics: concepts and implementation. CRC Press, 2008.
  28. Review of Ophthalmology E-Book: Expert Consult - Online and Print. Elsevier Health Sciences, 2012.
  29. Smooth voxel surface for medical volumetric rendering. In R. Su, ed., 2019 International Conference on Image and Video Processing, and Artificial Intelligence, vol. 11321 of Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, p. 113210Y, Nov. 2019. doi: 10 . 1117/12 . 2549468
  30. Retina E-Book: 3 Volume Set. Elsevier Health Sciences, 2012.
  31. Balanced chamfer distance as a comprehensive metric for point cloud completion. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, eds., Advances in Neural Information Processing Systems, vol. 34, pp. 29088–29100. Curran Associates, Inc., 2021.
  32. Voxel-based representation of 3d point clouds: Methods, applications, and its potential use in the construction industry. Automation in Construction, 126:103675, 2021.
  33. A control-theoretic approach for dynamic adaptive video streaming over http. SIGCOMM Comput. Commun. Rev., 45(4):325–338, aug 2015. doi: 10 . 1145/2829988 . 2787486
  34. Textured mesh vs coloured point cloud: A subjective study for volumetric video compression. In 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX), pp. 1–6, 2020. doi: 10 . 1109/QoMEX48832 . 2020 . 9123137
  35. YuZu: Neural-Enhanced volumetric video streaming. In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22), pp. 137–154. USENIX Association, Renton, WA, Apr. 2022.
  36. Open3D: A modern library for 3D data processing. arXiv:1801.09847, 2018.
Citations (2)

Summary

We haven't generated a summary for this paper yet.