LiveVV: Human-Centered Live Volumetric Video Streaming System (2310.08205v2)
Abstract: Volumetric video has emerged as a prominent medium within the realm of eXtended Reality (XR) with the advancements in computer graphics and depth capture hardware. Users can fully immersive themselves in volumetric video with the ability to switch their viewport in six degree-of-freedom (DOF), including three rotational dimensions (yaw, pitch, roll) and three translational dimensions (X, Y, Z). Different from traditional 2D videos that are composed of pixel matrices, volumetric videos employ point clouds, meshes, or voxels to represent a volumetric scene, resulting in significantly larger data sizes. While previous works have successfully achieved volumetric video streaming in video-on-demand scenarios, the live streaming of volumetric video remains an unresolved challenge due to the limited network bandwidth and stringent latency constraints. In this paper, we for the first time propose a holistic live volumetric video streaming system, LiveVV, which achieves multi-view capture, scene segmentation & reuse, adaptive transmission, and rendering. LiveVV contains multiple lightweight volumetric video capture modules that are capable of being deployed without prior preparation. To reduce bandwidth consumption, LiveVV processes static and dynamic volumetric content separately by reusing static data with low disparity and decimating data with low visual saliency. Besides, to deal with network fluctuation, LiveVV integrates a volumetric video adaptive bitrate streaming algorithm (VABR) to enable fluent playback with the maximum quality of experience. Extensive real-world experiment shows that LiveVV can achieve live volumetric video streaming at a frame rate of 24 fps with a latency of less than 350ms.
- Quad-mesh generation and processing: A survey. In Computer graphics forum, vol. 32, pp. 51–76. Wiley Online Library, 2013.
- S. Bu and S. Lee. Easy to calibrate: Marker-less calibration of multiview azure kinect. CMES-Computer Modeling in Engineering & Sciences, 136(3), 2023.
- M. Chen. Leveraging the asymmetric sensitivity of eye contact for videoconference. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’02, p. 49–56. Association for Computing Machinery, New York, NY, USA, 2002. doi: 10 . 1145/503376 . 503386
- Plane detection in 3d point cloud using octree-balanced density down-sampling and iterative adaptive plane extraction. IET Image Processing, 12(9):1595–1605, 2018.
- An overview of ongoing point cloud compression standardization activities: Video-based (v-pcc) and geometry-based (g-pcc). APSIPA Transactions on Signal and Information Processing, 9:e13, 2020.
- Deep learning for 3d point clouds: A survey. IEEE transactions on pattern analysis and machine intelligence, 43(12):4338–4364, 2020.
- ViVo: Visibility-aware mobile volumetric video streaming. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking, MobiCom ’20. Association for Computing Machinery, New York, NY, USA, 2020. doi: 10 . 1145/3372224 . 3380888
- FSVVD: A dataset of full scene volumetric video. In Proceedings of the 14th Conference on ACM Multimedia Systems, MMSys ’23, p. 410–415. Association for Computing Machinery, New York, NY, USA, 2023. doi: 10 . 1145/3587819 . 3592551
- Understanding user behavior in volumetric video watching: Dataset, analysis and prediction. arXiv preprint arXiv:2308.07578, 2023.
- Dynamic 3d avatar creation from hand-held video input. ACM Trans. Graph., 34(4), jul 2015. doi: 10 . 1145/2766974
- Achieving eye contact in a one-to-many 3d video teleconferencing system. ACM Trans. Graph., 28(3), jul 2009. doi: 10 . 1145/1531326 . 1531370
- Livescan3D: A fast and inexpensive 3d data acquisition system for multiple kinect v2 sensors. In 2015 International Conference on 3D Vision, pp. 318–325. IEEE. doi: 10 . 1109/3DV . 2015 . 43
- P. Lambert. Volumetric video market by volumetric capture (hardware, software, service), content delivery, application (sports,events, entertainment, medical, education, training, signage and advertisement) and region - global forecast to 2028. Technical report, MarketsAndMarkets, 5 2023.
- Farfetchfusion: Towards fully mobile live 3d telepresence platform. In Proceedings of the 29th Annual International Conference on Mobile Computing and Networking, ACM MobiCom ’23. Association for Computing Machinery, New York, NY, USA, 2023. doi: 10 . 1145/3570361 . 3592525
- Cav3: Cache-assisted viewport adaptive volumetric video streaming. In 2023 IEEE Conference Virtual Reality and 3D User Interfaces (VR), pp. 173–183, 2023. doi: 10 . 1109/VR55154 . 2023 . 00033
- Fov-aware edge caching for adaptive 360 video streaming. In Proceedings of the 26th ACM international conference on Multimedia, pp. 173–181, 2018.
- Holoportation: Virtual 3d teleportation in real-time. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology, UIST ’16, p. 741–754. Association for Computing Machinery, New York, NY, USA, 2016. doi: 10 . 1145/2984511 . 2984517
- S. J. Owen. A survey of unstructured mesh generation technology. IMR, 239(267):15, 1998.
- Y. Pan and A. Steed. A gaze-preserving situated multiview telepresence system. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’14, p. 2173–2176. Association for Computing Machinery, New York, NY, USA, 2014. doi: 10 . 1145/2556288 . 2557320
- Y. Pan and A. Steed. Effects of 3d perspective on head gaze estimation with a multiview autostereoscopic display. Int. J. Hum.-Comput. Stud., 86(C):138–148, feb 2016. doi: 10 . 1016/j . ijhcs . 2015 . 10 . 004
- Optimizing 360 video delivery over cellular networks. In Proceedings of the 5th Workshop on All Things Cellular: Operations, Applications and Challenges, pp. 1–6, 2016.
- Cwipc-sxr: Point cloud dynamic human dataset for social xr. In Proceedings of the 12th ACM Multimedia Systems Conference, MMSys ’21, p. 300–306. Association for Computing Machinery, New York, NY, USA, 2021. doi: 10 . 1145/3458305 . 3478452
- Orb: An efficient alternative to sift or surf. In 2011 International conference on computer vision, pp. 2564–2571. Ieee, 2011.
- Color Atlas and Synopsis of Clinical Ophthalmology–Wills Eye Institute–Neuro-Ophthalmology. Lippincott Williams & Wilkins, 2012.
- Capture and 3d video processing of volumetric video. In 2019 IEEE International conference on image processing (ICIP), pp. 4310–4314. IEEE, 2019.
- Eye-tracking for avatar eye-gaze and interactional analysis in immersive collaborative virtual environments. In Proceedings of the 2008 ACM Conference on Computer Supported Cooperative Work, CSCW ’08, p. 197–200. Association for Computing Machinery, New York, NY, USA, 2008. doi: 10 . 1145/1460563 . 1460593
- Essentials of interactive computer graphics: concepts and implementation. CRC Press, 2008.
- Review of Ophthalmology E-Book: Expert Consult - Online and Print. Elsevier Health Sciences, 2012.
- Smooth voxel surface for medical volumetric rendering. In R. Su, ed., 2019 International Conference on Image and Video Processing, and Artificial Intelligence, vol. 11321 of Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, p. 113210Y, Nov. 2019. doi: 10 . 1117/12 . 2549468
- Retina E-Book: 3 Volume Set. Elsevier Health Sciences, 2012.
- Balanced chamfer distance as a comprehensive metric for point cloud completion. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, eds., Advances in Neural Information Processing Systems, vol. 34, pp. 29088–29100. Curran Associates, Inc., 2021.
- Voxel-based representation of 3d point clouds: Methods, applications, and its potential use in the construction industry. Automation in Construction, 126:103675, 2021.
- A control-theoretic approach for dynamic adaptive video streaming over http. SIGCOMM Comput. Commun. Rev., 45(4):325–338, aug 2015. doi: 10 . 1145/2829988 . 2787486
- Textured mesh vs coloured point cloud: A subjective study for volumetric video compression. In 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX), pp. 1–6, 2020. doi: 10 . 1109/QoMEX48832 . 2020 . 9123137
- YuZu: Neural-Enhanced volumetric video streaming. In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22), pp. 137–154. USENIX Association, Renton, WA, Apr. 2022.
- Open3D: A modern library for 3D data processing. arXiv:1801.09847, 2018.