Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

HMD-Poser: On-Device Real-time Human Motion Tracking from Scalable Sparse Observations (2403.03561v1)

Published 6 Mar 2024 in cs.CV

Abstract: It is especially challenging to achieve real-time human motion tracking on a standalone VR Head-Mounted Display (HMD) such as Meta Quest and PICO. In this paper, we propose HMD-Poser, the first unified approach to recover full-body motions using scalable sparse observations from HMD and body-worn IMUs. In particular, it can support a variety of input scenarios, such as HMD, HMD+2IMUs, HMD+3IMUs, etc. The scalability of inputs may accommodate users' choices for both high tracking accuracy and easy-to-wear. A lightweight temporal-spatial feature learning network is proposed in HMD-Poser to guarantee that the model runs in real-time on HMDs. Furthermore, HMD-Poser presents online body shape estimation to improve the position accuracy of body joints. Extensive experimental results on the challenging AMASS dataset show that HMD-Poser achieves new state-of-the-art results in both accuracy and real-time performance. We also build a new free-dancing motion dataset to evaluate HMD-Poser's on-device performance and investigate the performance gap between synthetic data and real-captured sensor data. Finally, we demonstrate our HMD-Poser with a real-time Avatar-driving application on a commercial HMD. Our code and free-dancing motion dataset are available https://pico-ai-team.github.io/hmd-poser

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. Optitrack motion systems. https://optitrack.com/.
  2. Unrealego: A new dataset for robust egocentric 3d human motion capture. In Proceedings of the European Conference on Computer Vision, pages 1–17. Springer, 2022.
  3. Flag: Flow-based 3d avatar generation from sparse observations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13253–13262, 2022.
  4. Hmd-nemo: Online 3d avatar motion generation from sparse observations. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9622–9631, 2023.
  5. Real-time rgbd-based extended body pose estimation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 2807–2816, 2021.
  6. Keep it smpl: Automatic estimation of 3d human pose and shape from a single image. In Proceedings of the European Conference on Computer Vision, pages 561–578. Springer, 2016.
  7. Learning variational motion prior for video-based motion capture. arXiv preprint arXiv:2210.15134, 2022.
  8. Full-body motion from a single head-mounted device: Generating smpl poses from partial observations. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11687–11697, 2021.
  9. Avatars grow legs: Generating smooth human motion from sparse tracking inputs with diffusion model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 481–490, 2023.
  10. Trajectory optimization for physics-based reconstruction of 3d human pose from monocular video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13106–13115, 2022.
  11. Soma: Solving optical marker-based mocap automatically. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11117–11126, 2021.
  12. CMU graphics lab. Cmu graphics lab motion capture database. http://mocap.cs.cmu.edu/, 2000.
  13. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
  14. Neural mocon: Neural motion control for physically plausible human motion capture. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6417–6426, 2022.
  15. Deep inertial poser: Learning to reconstruct human pose from sparse inertial measurements in real time. ACM Transactions on Graphics, 37(6):1–15, 2018.
  16. Avatarposer: Articulated full-body pose tracking from sparse motion sensing. In Proceedings of the European Conference on Computer Vision, pages 443–460, 2022a.
  17. Transformer inertial poser: Attention-based real-time human motion reconstruction from sparse imus. arXiv preprint arXiv:2203.15720, 2022b.
  18. Exemplar fine-tuning for 3d human pose fitting towards in-the-wild 3d human pose estimation. arXiv preprint arXiv:2004.03686, 2020.
  19. End-to-end recovery of human shape and pose. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7122–7131, 2018.
  20. Learning 3d human dynamics from video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019.
  21. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  22. Vibe: Video inference for human body pose and shape estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.
  23. Pare: Part attention regressor for 3d human body estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11127–11137, 2021.
  24. Questenvsim: Environment-aware simulated motion tracking from sparse sensors. arXiv preprint arXiv:2306.05666, 2023.
  25. Niki: Neural inverse kinematics with invertible neural networks for 3d human pose and shape estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12933–12942, 2023a.
  26. Ego-body pose estimation via ego-head pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17142–17151, 2023b.
  27. Mhformer: Multi-hypothesis transformer for 3d human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13147–13156, 2022.
  28. 3d human pose and shape estimation through collaborative learning and multi-view model-fitting. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1888–1897, 2021.
  29. Mosh: motion and shape capture from sparse markers. ACM Transactions on Graphics., 33(6):220–1, 2014.
  30. Smpl: A skinned multi-person linear model. ACM Transactions on Graphics, 34(6):1–16, 2015.
  31. Amass: Archive of motion capture as surface shapes. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5442–5451, 2019.
  32. Isaac gym: High performance gpu-based physics simulation for robot learning. arXiv preprint arXiv:2108.10470, 2021.
  33. Multiview-consistent semi-supervised learning for 3d human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6907–6916, 2020.
  34. Mocap database hdm05. Institut für Informatik II, Universität Bonn, 2(7), 2007.
  35. An rnn-ensemble approach for real time human pose estimation from sparse imus. In Proceedings of the 3rd International Conference on Applications of Intelligent Systems, pages 1–6, 2020.
  36. Fusing monocular images and sparse imu signals for real-time human motion capture. arXiv preprint arXiv:2309.00310, 2023.
  37. Expressive body capture: 3d hands, face, and body from a single image. In Proceedings IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10975–10985, 2019.
  38. Sparseposer: Real-time full-body motion reconstruction from sparse data. ACM Transactions on Graphics, 43(1):1–14, 2023.
  39. Humaneva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. International Journal of Computer Vision, 87(1-2):4, 2010.
  40. Robustfusion: Human volumetric capture with data-driven visual cues using a rgbd camera. In Proceedings of the European Conference on Computer Vision, pages 246–264. Springer, 2020.
  41. Robustfusion: Robust volumetric performance reconstruction under human-object interactions from monocular rgbd stream. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5):6196–6213, 2022.
  42. 3d human pose estimation via intuitive physics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4713–4725, 2023.
  43. Nikolaus F Troje. Decomposing biological motion: A framework for analysis and synthesis of human gait patterns. Journal of Vision, 2(5):2–2, 2002.
  44. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  45. Sparse inertial poser: Automatic 3d human pose estimation from sparse imus. In Computer Graphics Forum, pages 349–360. Wiley Online Library, 2017.
  46. Estimating egocentric 3d human pose in global space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11500–11509, 2021.
  47. Scene-aware egocentric 3d human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13031–13040, 2023.
  48. Questsim: Human motion tracking from sparse sensors with simulated avatars. In SIGGRAPH Asia, pages 1–8, 2022.
  49. Unstructuredfusion: Realtime 4d geometry and texture reconstruction using commercialrgbd cameras. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–1, 2019.
  50. Transpose: Real-time 3d human translation and pose estimation with six inertial sensors. ACM Transactions on Graphics, 40(4):1–13, 2021.
  51. Physical inertial poser (pip): Physics-aware real-time human motion tracking from sparse inertial sensors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13167–13178, 2022.
  52. Egolocate: Real-time motion capture, localization, and mapping with sparse body-mounted sensors. arXiv preprint arXiv:2305.01599, 2023.
  53. Doublefusion: Real-time capture of human performances with inner body shapes from a single depth sensor. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7287–7296, 2018.
  54. Thundr: Transformer-based 3d human reconstruction with markers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12971–12980, 2021.
  55. Ray3d: ray-based 3d human pose estimation for monocular absolute 3d localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13116–13125, 2022.
  56. Mixste: Seq2seq mixed spatio-temporal encoder for 3d human pose estimation in video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13232–13242, 2022a.
  57. Voxeltrack: Multi-person 3d human pose estimation and tracking in the wild. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(2):2613–2626, 2022b.
  58. 3d human pose estimation with spatial and temporal transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11656–11665, 2021.
  59. Realistic full-body tracking from sparse observations via joint-level modeling. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14678–14688, 2023.
  60. On the continuity of rotation representations in neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5745–5753, 2019.
Citations (2)

Summary

We haven't generated a summary for this paper yet.