Sports Analysis and VR Viewing System Based on Player Tracking and Pose Estimation with Multimodal and Multiview Sensors (2405.01112v1)
Abstract: Sports analysis and viewing play a pivotal role in the current sports domain, offering significant value not only to coaches and athletes but also to fans and the media. In recent years, the rapid development of virtual reality (VR) and augmented reality (AR) technologies have introduced a new platform for watching games. Visualization of sports competitions in VR/AR represents a revolutionary technology, providing audiences with a novel immersive viewing experience. However, there is still a lack of related research in this area. In this work, we present for the first time a comprehensive system for sports competition analysis and real-time visualization on VR/AR platforms. First, we utilize multiview LiDARs and cameras to collect multimodal game data. Subsequently, we propose a framework for multi-player tracking and pose estimation based on a limited amount of supervised data, which extracts precise player positions and movements from point clouds and images. Moreover, we perform avatar modeling of players to obtain their 3D models. Ultimately, using these 3D player data, we conduct competition analysis and real-time visualization on VR/AR. Extensive quantitative experiments demonstrate the accuracy and robustness of our multi-player tracking and pose estimation framework. The visualization results showcase the immense potential of our sports visualization system on the domain of watching games on VR/AR devices. The multimodal competition dataset we collected and all related code will be released soon.
- 2D human pose estimation: New benchmark and state of the art analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3686–3693, 2014.
- Apple Inc. Apple vision pro. https://www.apple.com/vision-pro/, 2024. Accessed: 2024-03-30.
- Real-time rgbd-based extended body pose estimation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2807–2816, 2021.
- View invariant human body detection and pose estimation from multiple depth sensors. arXiv preprint arXiv:2005.04258, 2020.
- 3D pictorial structures for multiple human pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1669–1676, 2014.
- Multiple object tracking using k-shortest paths optimization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(9):1806–1819, 2011.
- K. Bernardin and R. Stiefelhagen. Evaluating multiple object tracking performance: The CLEAR MOT metrics. EURASIP J. Image Video Process., 2008.
- Domain adaptation through anatomical constraints for 3D human pose estimation under the cover. In International Conference on Medical Imaging with Deep Learning, pp. 173–187, 2022.
- Probabilistic 3D multi-modal, multi-object tracking for autonomous driving. In ICRA, pp. 14227–14233, 2021.
- Weakly supervised 3D multi-person pose estimation for large-scale scenes based on monocular camera and single LiDAR. arXiv preprint arXiv:2211.16951, 2022.
- Fast and robust multi-person 3D pose estimation from multiple views. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7792–7801, 2019.
- Alphapose: Whole-body regional multi-person pose estimation and tracking in real-time. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
- Multicamera people tracking with a probabilistic occupancy map. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(2):267–282, 2007.
- DECA: Deep viewpoint-equivariant human pose estimation using capsule autoencoders. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11677–11686, 2021.
- 3D human pose estimation in multi-view operating room videos using differentiable camera projections. Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization, pp. 1–9, 2022.
- Self-supervised 3D human pose estimation from video. Neurocomputing, 488:97–106, 2022.
- J. C. Gower. Generalized procrustes analysis. Psychometrika, 40:33–51, 1975.
- Towards good practices for deep 3D hand pose estimation. arXiv preprint arXiv:1707.07248, 2017.
- Fusing information from multiple 2D depth cameras for 3D human pose estimation in the operating room. International Journal of Computer Assisted Radiology and Surgery, 14:1871–1879, 2019.
- Towards viewpoint invariant 3D human pose estimation. In Proceedings of the European Conference on Computer Vision, pp. 160–177, 2016.
- Mask R-CNN. In ICCV, pp. 2980–2988, 2017.
- Deep residual learning for image recognition. In CVPR, pp. 770–778, 2016.
- Human3.6M: Large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(7):1325–1339, 2013.
- Learnable triangulation of human pose. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7718–7727, 2019.
- Selfrecon: Self reconstruction your digital avatar from monocular video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5605–5615, 2022.
- Whole-body human pose estimation in the wild. In Proceedings of the European Conference on Computer Vision, pp. 196–214, 2020.
- Panoptic studio: A massively multiview system for social motion capture. In Proceedings of the IEEE International Conference on Computer Vision, pp. 3334–3342, 2015.
- D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Self-supervised learning of 3D human pose using multi-view geometry. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1077–1086, 2019.
- Uncertainty-aware adaptation for self-supervised 3D human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20448–20459, 2022.
- PointPillars: Fast encoders for object detection from point clouds. In CVPR, pp. 12697–12705, 2019.
- Lidarcap: Long-range marker-less 3d human motion capture with lidar point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20502–20512, 2022.
- Deep continuous fusion for multi-sensor 3D object detection. In Proceedings of the European Conference on Computer Vision, pp. 641–656, 2018.
- BEVFusion: A simple and robust lidar-camera fusion framework. CoRR, abs/2205.13790, 2022.
- J. Lin and G. H. Lee. Multi-view multi-person 3D pose estimation with Plane Sweep Stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11886–11895, 2021.
- Microsoft COCO: Common objects in context. In Proceedings of the European Conference on Computer Vision, pp. 740–755, 2014.
- Football-specific validity of TRACAB’s optical video tracking systems. PloS one, 15(3):e0230179, 2020.
- SSD: single shot multibox detector. In ECCV (1), vol. 9905, pp. 21–37. Springer, 2016.
- Live Like, Inc. Live like: Transforming fan engagement with interactive experiences. https://livelike.com/, 2024. Accessed: 2024-03-30.
- SMPL: A skinned multi-person linear model. ACM Transactions on Graphics, 34(6):1–16, 2015.
- HOTA: A higher order metric for evaluating multi-object tracking. Int. J. Comput. Vis., 129(2):548–578, 2021.
- Fast and Furious: Real time end-to-end 3D detection, tracking and motion forecasting with a single convolutional net. In CVPR, 2018.
- A survey on player tracking in soccer videos. Comput. Vis. Image Underst., 159:19–46, 2017.
- Residual pose: A decoupled approach for depth-based 3D human pose estimation. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 10313–10318, 2020.
- Meta Platforms, Inc. Oculus quest: All-in-one vr gaming system. https://www.oculus.com/quest/, 2024. Accessed: 2024-03-30.
- Microsoft Corporation. Hololens: Mixed reality technology for professionals and developers. https://www.microsoft.com/en-us/hololens, 2024. Accessed: 2024-03-30.
- V2V-Posenet: Voxel-to-voxel prediction network for accurate 3D hand and human pose estimation from a single depth map. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 5079–5088, 2018.
- LMGP: lifted multicut meets geometry projections for multi-camera multi-object tracking. In CVPR, pp. 8856–8865, 2022.
- Pico Interactive, Inc. Pico neo 3: Standalone vr headset for business. https://www.picoxr.com/cn/neo3/, 2024. Accessed: 2024-03-30.
- 4D-net for learned multi-modal alignment. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15435–15445, 2021.
- PointNet: Deep learning on point sets for 3D classification and segmentation. In CVPR, pp. 77–85, 2017.
- DyGLIP: A dynamic graph model with link prediction for accurate multi-camera multiple object tracking. In CVPR, pp. 13784–13793, June 2021.
- Tessetrack: End-to-end learnable multi-person articulated 3D pose tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15190–15200, 2021.
- Lightweight multi-view 3D pose estimation through camera-disentangled representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6040–6049, 2020.
- Complexer-YOLO: Real-time 3D object detection and tracking on semantic point clouds. In CVPR Workshops, pp. 1190–1199, 2019.
- Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5693–5703, 2019.
- VoxelPose: Towards multi-camera 3D human pose estimation in wild environment. In Proceedings of the European Conference on Computer Vision, pp. 197–212, 2020.
- C. D. Vleeschouwer and D. Delannay. Basket ball dataset from the european project apidis, 2009.
- Recovering accurate 3D human pose in the wild using imus and a moving camera. In Proceedings of the European Conference on Computer Vision, pp. 601–617, 2018.
- PointTrackNet: An end-to-end network for 3-D object detection and tracking from point clouds. IEEE Robotics Autom. Lett., 2020.
- 3D multi-object tracking: A baseline and new evaluation metrics. In IROS, pp. 10359–10366, 2020.
- GNN3DMOT: graph neural network for 3D multi-object tracking with 2D-3D multi-feature learning. In CVPR, 2020.
- Simple online and realtime tracking with a deep association metric. In ICIP, pp. 3645–3649, 2017.
- Graph-based 3D multi-person pose estimation using multi-view images. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11148–11157, 2021.
- Vitpose: Simple vision transformer baselines for human pose estimation. arXiv preprint arXiv:2204.12484, 2022.
- SECOND: sparsely embedded convolutional detection. Sensors, 18(10):3337, 2018.
- Faster VoxelPose: Real-time 3d human pose estimation by orthographic projection. In Proceedings of the European Conference on Computer Vision, pp. 142–159, 2022.
- Center-based 3D object detection and tracking. In CVPR, pp. 11784–11793, 2021.
- Direct multi-view multi-person 3D pose estimation. Advances in Neural Information Processing Systems, 34:13153–13164, 2021.
- A flexible multi-view multi-modal imaging system for large-scale outdoor scenes. In 3DV, 2022.
- Multi-camera multi-player tracking with deep player identification in sports video. Pattern Recognit., 102:107260, 2020.
- Pose2seg: Detection free human instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 889–898, 2019.
- Voxeltrack: Multi-person 3D human pose estimation and tracking in the wild. CoRR, abs/2108.02452, 2021.
- Sequential 3D human pose estimation using adaptive point cloud sampling strategy. In International Joint Conferences on Artificial Intelligence Organization, pp. 1330–1337, 2021.
- Multi-modal 3D human pose estimation with 2D weak supervision in autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4478–4487, 2022.
- Distance-IoU loss: Faster and better learning for bounding box regression. In AAAI, pp. 12993–13000, 2020.
- Tracking objects as points. In ECCV (4), vol. 12349, pp. 474–490, 2020.
- Wenxuan Guo (20 papers)
- Zhiyu Pan (24 papers)
- Ziheng Xi (4 papers)
- Alapati Tuerxun (1 paper)
- Jianjiang Feng (37 papers)
- Jie Zhou (687 papers)