LiveHPS: LiDAR-based Scene-level Human Pose and Shape Estimation in Free Environment (2402.17171v1)
Abstract: For human-centric large-scale scenes, fine-grained modeling for 3D human global pose and shape is significant for scene understanding and can benefit many real-world applications. In this paper, we present LiveHPS, a novel single-LiDAR-based approach for scene-level human pose and shape estimation without any limitation of light conditions and wearable devices. In particular, we design a distillation mechanism to mitigate the distribution-varying effect of LiDAR point clouds and exploit the temporal-spatial geometric and dynamic information existing in consecutive frames to solve the occlusion and noise disturbance. LiveHPS, with its efficient configuration and high-quality output, is well-suited for real-world applications. Moreover, we propose a huge human motion dataset, named FreeMotion, which is collected in various scenarios with diverse human poses, shapes and translations. It consists of multi-modal and multi-view acquisition data from calibrated and synchronized LiDARs, cameras, and IMUs. Extensive experiments on our new dataset and other public datasets demonstrate the SOTA performance and robustness of our approach. We will release our code and dataset soon.
- Easymocap - make human motion capture easier. Github, 2021.
- Multi-view pictorial structures for 3D human pose estimation. In BMVC, 2009.
- A data-driven approach for real-time full body pose reconstruction from a depth camera. In ICCV, 2011.
- Semantickitti: A dataset for semantic scene understanding of lidar sequences. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9297–9307, 2019.
- Method for registration of 3-d shapes. In Sensor fusion IV: control paradigms and data structures, pages 586–606. Spie, 1992.
- Keep it smpl: Automatic estimation of 3d human pose and shape from a single image. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part V 14, pages 561–578. Springer, 2016.
- 3D pictorial structures for multiple view articulated pose estimation. In CVPR, 2013.
- Humman: Multi-modal 4d human dataset for versatile sensing and modeling. In European Conference on Computer Vision, pages 557–577. Springer, 2022.
- Pointhps: Cascaded 3d human pose and shape estimation from point clouds. arXiv preprint arXiv:2308.14492, 2023.
- Stcrowd: A multimodal dataset for pedestrian perception in crowded scenes. In CVPR, pages 19608–19617, 2022a.
- Stcrowd: A multimodal dataset for pedestrian perception in crowded scenes. arXiv preprint arXiv:2204.01026, 2022b.
- Hsc4d: Human-centered 4d scene capture in large-scale indoor-outdoor space using wearable imus and lidar. In CVPR, pages 6792–6802, 2022.
- Sloper4d: A scene-aware dataset for global 4d human pose estimation in urban environments. arXiv preprint arXiv:2303.09095, 2023.
- Efficient ConvNet-based marker-less motion capture in general scenes with a low number of cameras. In CVPR, 2015.
- A density-based algorithm for discovering clusters in large spatial databases with noise. In kdd, pages 226–231, 1996.
- Twinfusion: High framerate non-rigid fusion through fast correspondence tracking. In 3DV, pages 596–605, 2018.
- Livecap: Real-time human performance capture from monocular video. ACM Transactions on Graphics (TOG), 38(2):14:1–14:17, 2019.
- Deepcap: Monocular human performance capture using weak supervision. In CVPR, 2020.
- Challencap: Monocular 3d capture of challenging human performances using multi-modal references. In CVPR, pages 11400–11411, 2021.
- Towards accurate marker-less human shape and pose estimation over time. In 3DV, pages 421–430, 2017.
- Deep inertial poser: Learning to reconstruct human pose from sparse inertial measurements in real time. ACM Transactions on Graphics (TOG), 37(6):1–15, 2018.
- Human3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. TPAMI, 36(7):1325–1339, 2013.
- Movin: Real-time motion capture using a single lidar. arXiv preprint arXiv:2309.09314, 2023.
- Panoptic studio: A massively multiview system for social motion capture. In ICCV, 2015.
- End-to-end recovery of human shape and pose. In CVPR, 2018.
- Learning 3d human dynamics from video. In CVPR, 2019.
- Pedx: Benchmark dataset for metric 3-d pose estimation of pedestrians in complex urban intersections. IRAL, 4(2):1940–1947, 2019.
- Vibe: Video inference for human body pose and shape estimation. In CVPR, 2020.
- Pare: Part attention regressor for 3d human body estimation. In ICCV, pages 11127–11137, 2021.
- Convolutional mesh regression for single-image human shape reconstruction. In CVPR, 2019.
- Probabilistic modeling for human mesh recovery. In ICCV, pages 11605–11614, 2021.
- Unite the people: Closing the loop between 3d and 2d human representations. In CVPR, pages 6050–6059, 2017.
- Lidarcap: Long-range marker-less 3d human motion capture with lidar point clouds. arXiv preprint arXiv:2203.14698, 2022.
- Ai choreographer: Music conditioned 3d dance generation with aist++, 2021.
- Smpl: A skinned multi-person linear model. ACM Trans. Graph., 34(6):248:1–248:16, 2015.
- Dynamics-regulated kinematic policy for egocentric pose estimation. Advances in Neural Information Processing Systems, 34, 2021.
- Amass: Archive of motion capture as surface shapes. In ICCV, 2019.
- Monocular 3d human pose estimation in the wild using improved cnn supervision. In 3DV, pages 506–516. IEEE, 2017.
- Noitom. Noitom Motion Capture Systems. https://www.noitom.com/, 2015.
- OptiTrack. OptiTrack Motion Capture Systems. https://www.optitrack.com/, 2009.
- Harvesting multiple views for marker-less 3d human pose annotations. In CVPR, 2017.
- Cl3d: Unsupervised domain adaptation for cross-lidar 3d detection. AAAI, 2023.
- RealityCapture. Capturing Reality. https://www.capturingreality.com/, 2023.
- Humor: 3d human motion model for robust pose estimation. In ICCV, pages 11488–11499, 2021.
- Lidar-aid inertial poser: Large-scale human motion capture by sparse inertial and lidar sensors. TVCG, 2023.
- A versatile scene model with differentiable visibility applied to generative pose estimation. In ICCV, 2015.
- Model-based outdoor performance capture. In 3DV, 2016.
- Real-time human pose recognition in parts from single depth images. In CVPR, 2011.
- HumanEva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. IJCV, 2010.
- Hand keypoint detection in single images using multiview bootstrapping. In CVPR, 2017.
- Learning from synthetic humans. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 109–117, 2017.
- Vicon. Vicon Motion Capture Systems. https://www.vicon.com/, 2010.
- Practical motion capture in everyday surroundings. TOG, 26(3):35–es, 2007.
- Sparse inertial poser: Automatic 3d human pose estimation from sparse imus. In Computer Graphics Forum, pages 349–360. Wiley Online Library, 2017.
- Recovering accurate 3d human pose in the wild using imus and a moving camera. In ECCV, pages 601–617, 2018.
- Softgroup for 3d instance segmentation on point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2708–2717, 2022.
- Accurate realtime full-body motion capture using a single depth camera. SIGGRAPH Asia, 31(6):188:1–12, 2012.
- XSENS. Xsens Technologies B.V. https://www.xsens.com/, 2011.
- Eventcap: Monocular 3d capture of high-speed human motions using an event camera. In CVPR, 2020.
- Monoperfcap: Human performance capture from monocular video. ACM Transactions on Graphics (TOG), 37(2):27:1–27:15, 2018.
- Human-centric scene understanding for 3d large-scale scenarios. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 20349–20359, 2023.
- Cimi4d: A large multimodal climbing motion dataset under human-scene interactions. arXiv preprint arXiv:2303.17948, 2023.
- Transpose: Real-time 3d human translation and pose estimation with six inertial sensors. ACM Transactions on Graphics (TOG), 40(4):1–13, 2021.
- Physical inertial poser (pip): Physics-aware real-time human motion tracking from sparse inertial sensors. In CVPR, 2022.
- Center-based 3d object detection and tracking. CVPR, 2021.
- Doublefusion: Real-time capture of human performances with inner body shapes from a single depth sensor. TPAMI, 2019.
- Neural descent for visual 3d human pose and shape. arXiv preprint arXiv:2008.06910, 2020.
- Zhengyou Zhang. A flexible new technique for camera calibration. IEEE Transactions on pattern analysis and machine intelligence, 22(11):1330–1334, 2000.
- Ssn: Shape signature networks for multi-class object detection from point clouds. In ECCV, pages 581–597. Springer, 2020.
- Cylindrical and asymmetrical 3d convolution networks for lidar-based perception. TPAMI, 2021.
- Yiming Ren (22 papers)
- Xiao Han (127 papers)
- Chengfeng Zhao (6 papers)
- Jingya Wang (68 papers)
- Lan Xu (102 papers)
- Jingyi Yu (171 papers)
- Yuexin Ma (97 papers)