Dynamic Inertial Poser (DynaIP): Part-Based Motion Dynamics Learning for Enhanced Human Pose Estimation with Sparse Inertial Sensors (2312.02196v2)
Abstract: This paper introduces a novel human pose estimation approach using sparse inertial sensors, addressing the shortcomings of previous methods reliant on synthetic data. It leverages a diverse array of real inertial motion capture data from different skeleton formats to improve motion diversity and model generalization. This method features two innovative components: a pseudo-velocity regression model for dynamic motion capture with inertial sensors, and a part-based model dividing the body and sensor data into three regions, each focusing on their unique characteristics. The approach demonstrates superior performance over state-of-the-art models across five public datasets, notably reducing pose error by 19\% on the DIP-IMU dataset, thus representing a significant improvement in inertial sensor-based human pose estimation. Our codes are available at {\url{https://github.com/dx118/dynaip}}.
- Skeleton-aware networks for deep motion retargeting. ACM Transactions on Graphics (TOG), 39(4):62–1, 2020.
- mri: Multi-modal 3d human pose estimation dataset using mmwave, rgb-d, and inertial sensors. Advances in Neural Information Processing Systems(NeurIPS), 35:27414–27426, 2022.
- mmbody benchmark: 3d body reconstruction dataset and analysis for millimeter wave radar. In Proceedings of the 30th ACM International Conference on Multimedia (ACM MM), pages 3501–3510, 2022.
- Emokine: A kinematic dataset and computational framework for scaling up the creation of highly controlled emotional full-body movement datasets. https://doi.org/10.5281/zenodo.7821844, 2023.
- Hsc4d: Human-centered 4d scene capture in large-scale indoor-outdoor space using wearable imus and lidar. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), pages 6792–6802, 2022.
- Sloper4d: A scene-aware dataset for global 4d human pose estimation in urban environments. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), pages 682–692, 2023.
- Motion inference using sparse inertial sensors, self-supervised learning, and a new dataset of unscripted human motion. https://doi.org/10.7294/2v3w-sb92, 2020.
- Bottom-up human pose estimation via disentangled keypoint regression. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition(CVPR), pages 14676–14686, 2021.
- Unipd-bpe: Synchronized rgb-d and inertial data for multimodal body pose estimation and tracking. https://doi.org/10.17605/OSF.IO/YJ9Q4, 2022.
- Human poseitioning system (hps): 3d human pose estimation and self-localization in large scenes from body-mounted sensors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), pages 4318–4329, 2021.
- Visually plausible human-object interaction capture from wearable sensors. arXiv preprint arXiv:2205.02830, 2022.
- Ronin: Robust neural inertial navigation in the wild: Benchmark, evaluations, & new methods. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 3146–3152. IEEE, 2020.
- Daformer: Improving network architectures and training strategies for domain-adaptive semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), pages 9924–9935, 2022.
- Deep inertial poser: Learning to reconstruct human pose from sparse inertial measurements in real time. ACM Transactions on Graphics (TOG), 37(6):1–15, 2018.
- Avatarposer: Articulated full-body pose tracking from sparse motion sensing. In Proceedings of European Conference on Computer Vision(ECCV). Springer, 2022a.
- Transformer inertial poser: Real-time human motion reconstruction from sparse imus with simultaneous terrain generation. In SIGGRAPH Asia 2022 Conference Papers(SA’ 22), pages 1–9, 2022b.
- Em-pose: 3d human pose estimation from sparse electromagnetic trackers. In Proceedings of the IEEE/CVF international conference on computer vision(ICCV), pages 11510–11520, 2021.
- Vibe: Video inference for human body pose and shape estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition(CVPR), pages 5253–5263, 2020.
- Imutube: Automatic extraction of virtual on-body accelerometry from video for human activity recognition. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies(IMWUT), 4(3):1–29, 2020.
- Uncertainty-aware human mesh recovery from video by learning part-based 3d dynamics. In Proceedings of the IEEE/CVF International Conference on Computer Vision(ICCV), pages 12375–12384, 2021.
- Mosh: motion and shape capture from sparse markers. ACM Trans. Graph. (TOG), 33(6):220–1, 2014.
- Smpl: A skinned multi-person linear model. In Seminal Graphics Papers: Pushing the Boundaries, Volume 2, pages 851–866. 2023.
- 3d human mesh estimation from virtual markers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 534–543, 2023.
- Amass: Archive of motion capture as surface shapes. In Proceedings of the IEEE/CVF international conference on computer vision(ICCV), pages 5442–5451, 2019.
- Human movement and ergonomics: An industry-oriented dataset for collaborative robotics. The International Journal of Robotics Research(IJRR), 38(14):1529–1537, 2019.
- Imuposer: Full-body pose estimation using imus in phones, watches, and earbuds. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems(CHI), pages 1–12, 2023.
- Humor: Human motion representation using topology-agnostic transformers for character animation retargeting. arXiv preprint arXiv:2305.18897, 2023.
- A mathematical introduction to robotic manipulation. CRC press, 2017.
- From raw measurements to human pose-a dataset with low-cost and high-end inertial-magnetic sensor data. Scientific Data, 9(1):591, 2022.
- Mars: Mixed virtual and real wearable sensors for human activity recognition with multidomain deep learning model. IEEE Internet of Things Journal, 8(11):9383–9396, 2021.
- Source-free domain adaptive human pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 4826–4836, 2023.
- Lidar-aid inertial poser: Large-scale human motion capture by sparse inertial and lidar sensors. IEEE Transactions on Visualization and Computer Graphics, 29(5):2337–2347, 2023.
- Learning 3d human pose estimation from dozens of datasets using a geometry-aware autoencoder to bridge between skeleton formats. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision(WACV), pages 2956–2966, 2023.
- Xsens mvn: Consistent tracking of human motion using inertial sensing. Xsens Technol, 1(8):1–8, 2018.
- Learning human mesh recovery in 3d scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), pages 17038–17047, 2023.
- Adaptive multi-view and temporal fusing transformer for 3d human pose estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence(TPAMI), 45(4):4122–4135, 2022.
- Action capture with accelerometers. In Proceedings of the 2008 ACM SIGGRAPH/Eurographics symposium on computer animation, pages 193–199, 2008.
- Motion reconstruction using sparse accelerometer data. ACM Transactions on Graphics(TOG), 30(3):1–12, 2011.
- 3d human pose estimation via intuitive physics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), pages 4713–4725, 2023.
- Total capture: 3d human pose estimation fusing video and inertial sensors. In Proceedings of 28th British Machine Vision Conference(BMVC), pages 1–13, 2017.
- Sparse inertial poser: Automatic 3d human pose estimation from sparse imus. In Computer graphics forum(CGF), pages 349–360. Wiley Online Library, 2017.
- Learning disentangled representation for mixed-reality human activity recognition with a single imu sensor. IEEE Transactions on Instrumentation and Measurement, 70:1–14, 2021.
- M4esh: mmwave-based 3d human mesh construction for multiple subjects. In Proceedings of the 20th ACM Conference on Embedded Networked Sensor Systems(SenSys), pages 391–406, 2022.
- Lobstr: Real-time lower-body pose prediction from sparse upper-body tracking signals. In Computer Graphics Forum(CGF), pages 265–275. Wiley Online Library, 2021.
- Transpose: Real-time 3d human translation and pose estimation with six inertial sensors. ACM Transactions on Graphics (TOG), 40(4):1–13, 2021.
- Physical inertial poser (pip): Physics-aware real-time human motion tracking from sparse inertial sensors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), pages 13167–13178, 2022.
- Srnet: Improving generalization in 3d human pose estimation with a split-and-recombine approach. In Proceedings of European Conference on Computer Vision(ECCV), pages 507–523. Springer, 2020.
- Transfer adaptation learning: A decade survey. IEEE Transactions on Neural Networks and Learning Systems, 2022.
- Fusing wearable imus with multi-view images for human pose estimation: A geometric approach. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), pages 2200–2209, 2020.
- Through-wall human pose estimation using radio signals. In Proceedings of the IEEE conference on computer vision and pattern recognition(CVPR), pages 7356–7365, 2018.
- Deep learning-based human pose estimation: A survey. ACM Computing Surveys, 56(1):1–37, 2023.
- Yu Zhang (1400 papers)
- Songpengcheng Xia (18 papers)
- Lei Chu (34 papers)
- Jiarui Yang (20 papers)
- Qi Wu (323 papers)
- Ling Pei (36 papers)