Ultra Inertial Poser: Scalable Motion Capture and Tracking from Sparse Inertial Sensors and Ultra-Wideband Ranging (2404.19541v1)
Abstract: While camera-based capture systems remain the gold standard for recording human motion, learning-based tracking systems based on sparse wearable sensors are gaining popularity. Most commonly, they use inertial sensors, whose propensity for drift and jitter have so far limited tracking accuracy. In this paper, we propose Ultra Inertial Poser, a novel 3D full body pose estimation method that constrains drift and jitter in inertial tracking via inter-sensor distances. We estimate these distances across sparse sensor setups using a lightweight embedded tracker that augments inexpensive off-the-shelf 6D inertial measurement units with ultra-wideband radio-based ranging$-$dynamically and without the need for stationary reference anchors. Our method then fuses these inter-sensor distances with the 3D states estimated from each sensor Our graph-based machine learning model processes the 3D states and distances to estimate a person's 3D full body pose and translation. To train our model, we synthesize inertial measurements and distance estimates from the motion capture database AMASS. For evaluation, we contribute a novel motion dataset of 10 participants who performed 25 motion types, captured by 6 wearable IMU+UWB trackers and an optical motion capture system, totaling 200 minutes of synchronized sensor data (UIP-DB). Our extensive experiments show state-of-the-art performance for our method over PIP and TIP, reducing position error from $13.62$ to $10.65cm$ ($22\%$ better) and lowering jitter from $1.56$ to $0.055km/s3$ (a reduction of $97\%$).
- CoolMoves: User Motion Accentuation in Virtual Reality. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5, 2 (2021), 1–23.
- Robust ultra-wideband range error mitigation with deep learning at the edge. Engineering Applications of Artificial Intelligence 102 (2021), 104278. https://doi.org/10.1016/j.engappai.2021.104278
- Real-Time Low-Latency Tracking for UWB Tags. In Proceedings of the 20th Annual International Conference on Mobile Systems, Applications and Services (Portland, Oregon) (MobiSys ’22). Association for Computing Machinery, New York, NY, USA, 611–612. https://doi.org/10.1145/3498361.3538658
- NLOS Identification and Mitigation Using Low-Cost UWB Devices. Sensors 19, 16 (2019). https://doi.org/10.3390/s19163464
- Accurate position tracking with a single UWB anchor. In 2020 IEEE International Conference on Robotics and Automation (ICRA). 2344–2350. https://doi.org/10.1109/ICRA40945.2020.9197345
- Ching-Hang Chen and Deva Ramanan. 2017. 3d human pose estimation= 2d pose estimation+ matching. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7035–7043.
- Hybrid tracking of human operators using IMU/UWB data fusion by a Kalman filter. In 2008 3rd ACM/IEEE International Conference on Human-Robot Interaction (HRI). 193–200. https://doi.org/10.1145/1349822.1349848
- Performance capture from sparse multi-view video. In ACM SIGGRAPH 2008 papers. 1–10.
- Decawave. 2014. APS011 Application Note, Sources of error in DW1000 based two-way ranging (TWR) schemes.
- Decawave. 2017. How To Use , Configure and Program the DW1000 UWB.
- Decawave. 2018. APS014 Application Note, Antenna delay calibration of DW1000 based products and systems.
- Avatars Grow Legs: Generating Smooth Human Motion from Sparse Tracking Inputs with Diffusion Model. In CVPR.
- Kalman-Filter-Based Integration of IMU and UWB for High-Accuracy Indoor Positioning and Navigation. IEEE Internet of Things Journal 7, 4 (2020), 3133–3146. https://doi.org/10.1109/JIOT.2020.2965115
- Nima Ghorbani and Michael J. Black. 2021. SOMA: Solving Optical Marker-Based MoCap Automatically. In Proc. International Conference on Computer Vision (ICCV). 11117–11126.
- Human poseitioning system (hps): 3d human pose estimation and self-localization in large scenes from body-mounted sensors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4318–4329.
- Livecap: Real-time human performance capture from monocular video. ACM Transactions On Graphics (TOG) 38, 2 (2019), 1–17.
- Deepcap: Monocular human performance capture using weak supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5052–5063.
- Omni-directional person tracking on a flying robot using occlusion-robust ultra-wideband signals. In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 189–194. https://doi.org/10.1109/IROS.2016.7759054
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.
- Tightly coupled UWB/IMU pose estimation. In 2009 IEEE International Conference on Ultra-Wideband. 688–692. https://doi.org/10.1109/ICUWB.2009.5288724
- Conditional directed graph convolution for 3d human pose estimation. In Proceedings of the 29th ACM International Conference on Multimedia. 602–611.
- Deep inertial poser: Learning to reconstruct human pose from sparse inertial measurements in real time. ACM Transactions on Graphics (TOG) 37, 6 (2018), 1–15.
- IEEE. 2007. IEEE Standard for Information technology– Local and metropolitan area networks– Specific requirements– Part 15.4: Wireless Medium Access Control (MAC) and Physical Layer (PHY) Specifications for Low-Rate Wireless Personal Area Networks (WPANs): Amendment 1: Add Alternate PHYs. IEEE Std 802.15.4a-2007 (Amendment to IEEE Std 802.15.4-2006) (2007), 1–210. https://doi.org/10.1109/IEEESTD.2007.4299496
- EgoPoser: Robust Real-Time Ego-Body Pose Estimation in Large Scenes. arXiv preprint arXiv:2308.06493 (2023).
- Avatarposer: Articulated full-body pose tracking from sparse motion sensing. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part V. Springer, 443–460.
- Towards flexible blind JPEG artifacts removal. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4997–5006.
- Transformer Inertial Poser: Real-time Human Motion Reconstruction from Sparse IMUs with Simultaneous Terrain Generation. In SIGGRAPH Asia 2022 Conference Papers (Daegu, Republic of Korea) (SA ’22). Association for Computing Machinery, New York, NY, USA, Article 3, 9 pages. https://doi.org/10.1145/3550469.3555428
- End-to-end recovery of human shape and pose. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7122–7131.
- EM-POSE: 3D Human Pose Estimation From Sparse Electromagnetic Trackers. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 11510–11520.
- Diederik P Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In International Conference on Learning Representations.
- Vibe: Video inference for human body pose and shape estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5253–5263.
- Daniel Laidig and Thomas Seel. 2023. VQF: Highly accurate IMU orientation estimation with bias estimation and magnetic disturbance rejection. Information Fusion 91 (March 2023), 187–204. https://doi.org/10.1016/j.inffus.2022.10.014
- Angle of Arrival Estimation based on Channel Impulse Response Measurements. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 6686–6692. https://doi.org/10.1109/IROS40897.2019.8967562
- Drone Positioning System Using UWB Sensing and Out-of-Band Control. IEEE Sensors Journal 22, 6 (2022), 5329–5343. https://doi.org/10.1109/JSEN.2021.3127233
- A mobile robot hand-arm teleoperation system by vision and IMU. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 10900–10906.
- Cascaded deep monocular 3d human pose estimation with evolutionary training data. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 6173–6183.
- Realtime human motion control with a small number of inertial sensors. In Symposium on interactive 3D graphics and games. 133–140.
- SMPL: A skinned multi-person linear model. ACM transactions on graphics (TOG) 34, 6 (2015), 1–16.
- AMASS: Archive of Motion Capture as Surface Shapes. In International Conference on Computer Vision. 5442–5451.
- COAP: Compositional articulated occupancy of people. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13201–13210.
- Augmented reality and UWB technology fusion: Localization of objects with head mounted displays. In Proceedings of the 31st International Technical Meeting of the Satellite Division of The Institute of Navigation (ION GNSS+ 2018). 685–692.
- IMUPoser: Full-Body Pose Estimation Using IMUs in Phones, Watches, and Earbuds. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 529, 12 pages. https://doi.org/10.1145/3544548.3581392
- Fusing ultra-wideband range measurements with accelerometers and rate gyroscopes for quadrocopter state estimation. , 1730-1736 pages. https://doi.org/10.1109/ICRA.2015.7139421
- An RNN-ensemble approach for real time human pose estimation from sparse IMUs. In Proceedings of the 3rd International Conference on Applications of Intelligent Systems. 1–6.
- Noitom. 2024. https://www.noitom.com/. https://www.noitom.com/
- UWB and IMU-Based UAV’s Assistance System for Autonomous Landing on a Platform. Sensors 22, 6 (Mar 2022), 2347. https://doi.org/10.3390/s22062347
- Optitrack. 2023. https://wwww.optitrack.com/. https://www.optitrack.com/
- Fusing Monocular Images and Sparse IMU Signals for Real-Time Human Motion Capture. In SIGGRAPH Asia 2023 Conference Papers (, Sydney, NSW, Australia,) (SA ’23). Association for Computing Machinery, New York, NY, USA, Article 116, 11 pages. https://doi.org/10.1145/3610548.3618145
- VIO-UWB-Based Collaborative Localization and Dense Scene Reconstruction within Heterogeneous Multi-Robot Systems. arXiv:2011.00830 [cs.RO]
- EgoCap: Egocentric Marker-Less Motion Capture with Two Fisheye Cameras. ACM Trans. Graph. 35, 6, Article 162 (nov 2016), 11 pages. https://doi.org/10.1145/2980179.2980235
- Ultra-wideband Positioning Systems: Theoretical Limits, Ranging Algorithms, and Protocols. Cambridge University Press. https://doi.org/10.1017/CBO9780511541056
- Motion Capture from Body-Mounted Cameras. ACM Trans. Graph. 30, 4, Article 31 (jul 2011), 10 pages. https://doi.org/10.1145/2010324.1964926
- HOOV: Hand Out-Of-View Tracking for Proprioceptive Interaction using Inertial Sensing. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–16.
- Janis Tiemann and Christian Wietfeld. 2017. Scalable and precise multi-UAV indoor navigation using TDOA-based UWB localization. In 2017 International Conference on Indoor Positioning and Indoor Navigation (IPIN). 1–7. https://doi.org/10.1109/IPIN.2017.8115937
- xR-EgoPose: Egocentric 3D Human Pose From an HMD Camera. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE Computer Society, Los Alamitos, CA, USA, 7727–7737. https://doi.org/10.1109/ICCV.2019.00782
- DeepCIR: Insights into CIR-based Data-driven UWB Error Mitigation. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 13300–13307. https://doi.org/10.1109/IROS47612.2022.9981931
- Total capture: 3d human pose estimation fusing video and inertial sensors. In Proceedings of 28th British Machine Vision Conference. 1–13.
- Ubisense. 2023. https://ubisense.com/. https://ubisense.com/
- Vicon. 2023. https://wwww.vicon.com/. https://www.vicon.com/
- Practical motion capture in everyday surroundings. ACM transactions on graphics (TOG) 26, 3 (2007), 35–es.
- Recovering accurate 3d human pose in the wild using imus and a moving camera. In Proceedings of the European conference on computer vision (ECCV). 601–617.
- Sparse inertial poser: Automatic 3d human pose estimation from sparse imus. In Computer graphics forum, Vol. 36. Wiley Online Library, 349–360.
- Estimating egocentric 3d human pose in global space. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 11500–11509.
- Cbam: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV). 3–19.
- Xsens. 2024. https://www.xsens.com. https://www.xsens.com/
- Omni-Swarm: A Decentralized Omnidirectional Visual–Inertial–UWB State Estimation System for Aerial Swarms. IEEE Transactions on Robotics 38, 6 (2022), 3374–3394. https://doi.org/10.1109/TRO.2022.3182503
- LoBSTr: Real-time Lower-body Pose Prediction from Sparse Upper-body Tracking Signals. In Computer Graphics Forum, Vol. 40. Wiley Online Library, 265–275.
- EgoLocate: Real-time Motion Capture, Localization, and Mapping with Sparse Body-mounted Sensors. ACM Transactions on Graphics (TOG) 42, 4, Article 76 (2023), 17 pages.
- Physical Inertial Poser (PIP): Physics-aware Real-time Human Motion Tracking from Sparse Inertial Sensors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13167–13178.
- TransPose: real-time 3D human translation and pose estimation with six inertial sensors. ACM Transactions on Graphics (TOG) 40, 4 (2021), 1–13.
- Fusing wearable imus with multi-view images for human pose estimation: A geometric approach. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2200–2209.
- Semantic graph convolutional networks for 3d human pose regression. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 3425–3435.
- ULoc: Low-Power, Scalable and Cm-Accurate UWB-Tag Localization and Tracking for Indoor Applications. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 5, 3, Article 140 (sep 2021), 31 pages. https://doi.org/10.1145/3478124
- Multi-robot relative positioning and orientation system based on UWB range and graph optimization. Measurement 195 (2022), 111068. https://doi.org/10.1016/j.measurement.2022.111068
- Realistic Full-Body Tracking from Sparse Observations via Joint-Level Modeling. arXiv preprint arXiv:2308.08855 (2023).
- Zhiming Zou and Wei Tang. 2021. Modulated graph convolutional network for 3D human pose estimation. In Proceedings of the IEEE/CVF international conference on computer vision. 11477–11487.
- Rayan Armani (4 papers)
- Changlin Qian (2 papers)
- Jiaxi Jiang (12 papers)
- Christian Holz (34 papers)