Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Ultra Inertial Poser: Scalable Motion Capture and Tracking from Sparse Inertial Sensors and Ultra-Wideband Ranging (2404.19541v1)

Published 30 Apr 2024 in cs.CV, cs.AI, cs.GR, and eess.SP

Abstract: While camera-based capture systems remain the gold standard for recording human motion, learning-based tracking systems based on sparse wearable sensors are gaining popularity. Most commonly, they use inertial sensors, whose propensity for drift and jitter have so far limited tracking accuracy. In this paper, we propose Ultra Inertial Poser, a novel 3D full body pose estimation method that constrains drift and jitter in inertial tracking via inter-sensor distances. We estimate these distances across sparse sensor setups using a lightweight embedded tracker that augments inexpensive off-the-shelf 6D inertial measurement units with ultra-wideband radio-based ranging$-$dynamically and without the need for stationary reference anchors. Our method then fuses these inter-sensor distances with the 3D states estimated from each sensor Our graph-based machine learning model processes the 3D states and distances to estimate a person's 3D full body pose and translation. To train our model, we synthesize inertial measurements and distance estimates from the motion capture database AMASS. For evaluation, we contribute a novel motion dataset of 10 participants who performed 25 motion types, captured by 6 wearable IMU+UWB trackers and an optical motion capture system, totaling 200 minutes of synchronized sensor data (UIP-DB). Our extensive experiments show state-of-the-art performance for our method over PIP and TIP, reducing position error from $13.62$ to $10.65cm$ ($22\%$ better) and lowering jitter from $1.56$ to $0.055km/s3$ (a reduction of $97\%$).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (76)
  1. CoolMoves: User Motion Accentuation in Virtual Reality. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5, 2 (2021), 1–23.
  2. Robust ultra-wideband range error mitigation with deep learning at the edge. Engineering Applications of Artificial Intelligence 102 (2021), 104278. https://doi.org/10.1016/j.engappai.2021.104278
  3. Real-Time Low-Latency Tracking for UWB Tags. In Proceedings of the 20th Annual International Conference on Mobile Systems, Applications and Services (Portland, Oregon) (MobiSys ’22). Association for Computing Machinery, New York, NY, USA, 611–612. https://doi.org/10.1145/3498361.3538658
  4. NLOS Identification and Mitigation Using Low-Cost UWB Devices. Sensors 19, 16 (2019). https://doi.org/10.3390/s19163464
  5. Accurate position tracking with a single UWB anchor. In 2020 IEEE International Conference on Robotics and Automation (ICRA). 2344–2350. https://doi.org/10.1109/ICRA40945.2020.9197345
  6. Ching-Hang Chen and Deva Ramanan. 2017. 3d human pose estimation= 2d pose estimation+ matching. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7035–7043.
  7. Hybrid tracking of human operators using IMU/UWB data fusion by a Kalman filter. In 2008 3rd ACM/IEEE International Conference on Human-Robot Interaction (HRI). 193–200. https://doi.org/10.1145/1349822.1349848
  8. Performance capture from sparse multi-view video. In ACM SIGGRAPH 2008 papers. 1–10.
  9. Decawave. 2014. APS011 Application Note, Sources of error in DW1000 based two-way ranging (TWR) schemes.
  10. Decawave. 2017. How To Use , Configure and Program the DW1000 UWB.
  11. Decawave. 2018. APS014 Application Note, Antenna delay calibration of DW1000 based products and systems.
  12. Avatars Grow Legs: Generating Smooth Human Motion from Sparse Tracking Inputs with Diffusion Model. In CVPR.
  13. Kalman-Filter-Based Integration of IMU and UWB for High-Accuracy Indoor Positioning and Navigation. IEEE Internet of Things Journal 7, 4 (2020), 3133–3146. https://doi.org/10.1109/JIOT.2020.2965115
  14. Nima Ghorbani and Michael J. Black. 2021. SOMA: Solving Optical Marker-Based MoCap Automatically. In Proc. International Conference on Computer Vision (ICCV). 11117–11126.
  15. Human poseitioning system (hps): 3d human pose estimation and self-localization in large scenes from body-mounted sensors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4318–4329.
  16. Livecap: Real-time human performance capture from monocular video. ACM Transactions On Graphics (TOG) 38, 2 (2019), 1–17.
  17. Deepcap: Monocular human performance capture using weak supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5052–5063.
  18. Omni-directional person tracking on a flying robot using occlusion-robust ultra-wideband signals. In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 189–194. https://doi.org/10.1109/IROS.2016.7759054
  19. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.
  20. Tightly coupled UWB/IMU pose estimation. In 2009 IEEE International Conference on Ultra-Wideband. 688–692. https://doi.org/10.1109/ICUWB.2009.5288724
  21. Conditional directed graph convolution for 3d human pose estimation. In Proceedings of the 29th ACM International Conference on Multimedia. 602–611.
  22. Deep inertial poser: Learning to reconstruct human pose from sparse inertial measurements in real time. ACM Transactions on Graphics (TOG) 37, 6 (2018), 1–15.
  23. IEEE. 2007. IEEE Standard for Information technology– Local and metropolitan area networks– Specific requirements– Part 15.4: Wireless Medium Access Control (MAC) and Physical Layer (PHY) Specifications for Low-Rate Wireless Personal Area Networks (WPANs): Amendment 1: Add Alternate PHYs. IEEE Std 802.15.4a-2007 (Amendment to IEEE Std 802.15.4-2006) (2007), 1–210. https://doi.org/10.1109/IEEESTD.2007.4299496
  24. EgoPoser: Robust Real-Time Ego-Body Pose Estimation in Large Scenes. arXiv preprint arXiv:2308.06493 (2023).
  25. Avatarposer: Articulated full-body pose tracking from sparse motion sensing. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part V. Springer, 443–460.
  26. Towards flexible blind JPEG artifacts removal. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4997–5006.
  27. Transformer Inertial Poser: Real-time Human Motion Reconstruction from Sparse IMUs with Simultaneous Terrain Generation. In SIGGRAPH Asia 2022 Conference Papers (Daegu, Republic of Korea) (SA ’22). Association for Computing Machinery, New York, NY, USA, Article 3, 9 pages. https://doi.org/10.1145/3550469.3555428
  28. End-to-end recovery of human shape and pose. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7122–7131.
  29. EM-POSE: 3D Human Pose Estimation From Sparse Electromagnetic Trackers. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 11510–11520.
  30. Diederik P Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In International Conference on Learning Representations.
  31. Vibe: Video inference for human body pose and shape estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5253–5263.
  32. Daniel Laidig and Thomas Seel. 2023. VQF: Highly accurate IMU orientation estimation with bias estimation and magnetic disturbance rejection. Information Fusion 91 (March 2023), 187–204. https://doi.org/10.1016/j.inffus.2022.10.014
  33. Angle of Arrival Estimation based on Channel Impulse Response Measurements. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 6686–6692. https://doi.org/10.1109/IROS40897.2019.8967562
  34. Drone Positioning System Using UWB Sensing and Out-of-Band Control. IEEE Sensors Journal 22, 6 (2022), 5329–5343. https://doi.org/10.1109/JSEN.2021.3127233
  35. A mobile robot hand-arm teleoperation system by vision and IMU. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 10900–10906.
  36. Cascaded deep monocular 3d human pose estimation with evolutionary training data. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 6173–6183.
  37. Realtime human motion control with a small number of inertial sensors. In Symposium on interactive 3D graphics and games. 133–140.
  38. SMPL: A skinned multi-person linear model. ACM transactions on graphics (TOG) 34, 6 (2015), 1–16.
  39. AMASS: Archive of Motion Capture as Surface Shapes. In International Conference on Computer Vision. 5442–5451.
  40. COAP: Compositional articulated occupancy of people. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13201–13210.
  41. Augmented reality and UWB technology fusion: Localization of objects with head mounted displays. In Proceedings of the 31st International Technical Meeting of the Satellite Division of The Institute of Navigation (ION GNSS+ 2018). 685–692.
  42. IMUPoser: Full-Body Pose Estimation Using IMUs in Phones, Watches, and Earbuds. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 529, 12 pages. https://doi.org/10.1145/3544548.3581392
  43. Fusing ultra-wideband range measurements with accelerometers and rate gyroscopes for quadrocopter state estimation. , 1730-1736 pages. https://doi.org/10.1109/ICRA.2015.7139421
  44. An RNN-ensemble approach for real time human pose estimation from sparse IMUs. In Proceedings of the 3rd International Conference on Applications of Intelligent Systems. 1–6.
  45. Noitom. 2024. https://www.noitom.com/. https://www.noitom.com/
  46. UWB and IMU-Based UAV’s Assistance System for Autonomous Landing on a Platform. Sensors 22, 6 (Mar 2022), 2347. https://doi.org/10.3390/s22062347
  47. Optitrack. 2023. https://wwww.optitrack.com/. https://www.optitrack.com/
  48. Fusing Monocular Images and Sparse IMU Signals for Real-Time Human Motion Capture. In SIGGRAPH Asia 2023 Conference Papers (, Sydney, NSW, Australia,) (SA ’23). Association for Computing Machinery, New York, NY, USA, Article 116, 11 pages. https://doi.org/10.1145/3610548.3618145
  49. VIO-UWB-Based Collaborative Localization and Dense Scene Reconstruction within Heterogeneous Multi-Robot Systems. arXiv:2011.00830 [cs.RO]
  50. EgoCap: Egocentric Marker-Less Motion Capture with Two Fisheye Cameras. ACM Trans. Graph. 35, 6, Article 162 (nov 2016), 11 pages. https://doi.org/10.1145/2980179.2980235
  51. Ultra-wideband Positioning Systems: Theoretical Limits, Ranging Algorithms, and Protocols. Cambridge University Press. https://doi.org/10.1017/CBO9780511541056
  52. Motion Capture from Body-Mounted Cameras. ACM Trans. Graph. 30, 4, Article 31 (jul 2011), 10 pages. https://doi.org/10.1145/2010324.1964926
  53. HOOV: Hand Out-Of-View Tracking for Proprioceptive Interaction using Inertial Sensing. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–16.
  54. Janis Tiemann and Christian Wietfeld. 2017. Scalable and precise multi-UAV indoor navigation using TDOA-based UWB localization. In 2017 International Conference on Indoor Positioning and Indoor Navigation (IPIN). 1–7. https://doi.org/10.1109/IPIN.2017.8115937
  55. xR-EgoPose: Egocentric 3D Human Pose From an HMD Camera. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE Computer Society, Los Alamitos, CA, USA, 7727–7737. https://doi.org/10.1109/ICCV.2019.00782
  56. DeepCIR: Insights into CIR-based Data-driven UWB Error Mitigation. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 13300–13307. https://doi.org/10.1109/IROS47612.2022.9981931
  57. Total capture: 3d human pose estimation fusing video and inertial sensors. In Proceedings of 28th British Machine Vision Conference. 1–13.
  58. Ubisense. 2023. https://ubisense.com/. https://ubisense.com/
  59. Vicon. 2023. https://wwww.vicon.com/. https://www.vicon.com/
  60. Practical motion capture in everyday surroundings. ACM transactions on graphics (TOG) 26, 3 (2007), 35–es.
  61. Recovering accurate 3d human pose in the wild using imus and a moving camera. In Proceedings of the European conference on computer vision (ECCV). 601–617.
  62. Sparse inertial poser: Automatic 3d human pose estimation from sparse imus. In Computer graphics forum, Vol. 36. Wiley Online Library, 349–360.
  63. Estimating egocentric 3d human pose in global space. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 11500–11509.
  64. Cbam: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV). 3–19.
  65. Xsens. 2024. https://www.xsens.com. https://www.xsens.com/
  66. Omni-Swarm: A Decentralized Omnidirectional Visual–Inertial–UWB State Estimation System for Aerial Swarms. IEEE Transactions on Robotics 38, 6 (2022), 3374–3394. https://doi.org/10.1109/TRO.2022.3182503
  67. LoBSTr: Real-time Lower-body Pose Prediction from Sparse Upper-body Tracking Signals. In Computer Graphics Forum, Vol. 40. Wiley Online Library, 265–275.
  68. EgoLocate: Real-time Motion Capture, Localization, and Mapping with Sparse Body-mounted Sensors. ACM Transactions on Graphics (TOG) 42, 4, Article 76 (2023), 17 pages.
  69. Physical Inertial Poser (PIP): Physics-aware Real-time Human Motion Tracking from Sparse Inertial Sensors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13167–13178.
  70. TransPose: real-time 3D human translation and pose estimation with six inertial sensors. ACM Transactions on Graphics (TOG) 40, 4 (2021), 1–13.
  71. Fusing wearable imus with multi-view images for human pose estimation: A geometric approach. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2200–2209.
  72. Semantic graph convolutional networks for 3d human pose regression. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 3425–3435.
  73. ULoc: Low-Power, Scalable and Cm-Accurate UWB-Tag Localization and Tracking for Indoor Applications. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 5, 3, Article 140 (sep 2021), 31 pages. https://doi.org/10.1145/3478124
  74. Multi-robot relative positioning and orientation system based on UWB range and graph optimization. Measurement 195 (2022), 111068. https://doi.org/10.1016/j.measurement.2022.111068
  75. Realistic Full-Body Tracking from Sparse Observations via Joint-Level Modeling. arXiv preprint arXiv:2308.08855 (2023).
  76. Zhiming Zou and Wei Tang. 2021. Modulated graph convolutional network for 3D human pose estimation. In Proceedings of the IEEE/CVF international conference on computer vision. 11477–11487.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Rayan Armani (4 papers)
  2. Changlin Qian (2 papers)
  3. Jiaxi Jiang (12 papers)
  4. Christian Holz (34 papers)
Citations (6)

Summary

An Analysis of "Ultra Inertial Poser: Scalable Motion Capture and Tracking from Sparse Inertial Sensors and Ultra-Wideband Ranging"

The paper "Ultra Inertial Poser: Scalable Motion Capture and Tracking from Sparse Inertial Sensors and Ultra-Wideband Ranging" presents a novel method for tracking full-body motion using a combination of sparse inertial measurement units (IMUs) and ultra-wideband (UWB) radios to provide inter-sensor distances. The authors propose a sophisticated approach that fuses these disparate data streams using a graph-based neural network to accurately estimate human poses.

Methodological Advancements

The authors implement a new wearable sensing system, integrating 6DOF IMUs with UWB radios on compact wireless nodes. This setup allows for the estimation of orientation and acceleration and importantly introduces dynamic inter-sensor distance estimation using UWB ranging, circumventing the need for stationary anchors—a notable advancement from previous systems relying heavily on environment-embedded sensors.

For processing, the authors use a two-branch architecture: an LSTM network to capture temporal dynamics from IMU data and a Distance Attention Graph Convolutional Network (DA-GCN) to utilize inter-sensor distances. These branches are fused, allowing for a consistent estimation of sensor positions relative to the body, a crucial step in improving overall pose accuracy.

Empirical Evidence

The authors validate their methodology with a rigorously collected dataset (UIP-DB) of 10 participants performing varied motions. This includes 200 minutes of motion capture data synchronously collected from IMUs and UWB sensors alongside a 20-camera optical system for ground truth. This data supports a compelling claim of the paper: that the Ultra Inertial Poser reduces position errors and jitter significantly, with a reported 22% improvement over existing methods like PIP (Physical-Informed Pose) and TIP (Transformer-based Inertial Pose). Additionally, there is a 97% reduction in jitter, highlighting the method's effectiveness in producing smooth and accurate motion predictions.

Implications and Future Directions

The implications of this work are multifaceted. Practically, the scalability and affordability of this sensor setup allow for widespread adoption in fields such as virtual reality, gaming, and rehabilitation, offering a promising alternative to bulkier camera-based systems. Theoretically, this research emphasizes the importance of integrating spatial constraints via UWB ranging to enhance inertial sensor capabilities, which could push the boundaries of mobile and untethered motion capture systems.

Looking forward, potential advancements might focus on enhancing the robustness of UWB-based ranging in more complex environments and broader human activities. Additionally, continual refinement in machine learning models, perhaps through hybrid models leveraging other sensing technologies, could improve accuracy and reduce the computational footprint, making real-time applications in even more resource-constrained environments feasible.

Overall, the paper significantly contributes to human pose estimation by innovatively combining sparse sensor data with spatial constraints, offering a valuable toolkit for the next generation of scalable and robust motion capture technologies.

X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com