Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Real-Time Simulated Avatar from Head-Mounted Sensors (2403.06862v2)

Published 11 Mar 2024 in cs.CV, cs.GR, and cs.RO

Abstract: We present SimXR, a method for controlling a simulated avatar from information (headset pose and cameras) obtained from AR / VR headsets. Due to the challenging viewpoint of head-mounted cameras, the human body is often clipped out of view, making traditional image-based egocentric pose estimation challenging. On the other hand, headset poses provide valuable information about overall body motion, but lack fine-grained details about the hands and feet. To synergize headset poses with cameras, we control a humanoid to track headset movement while analyzing input images to decide body movement. When body parts are seen, the movements of hands and feet will be guided by the images; when unseen, the laws of physics guide the controller to generate plausible motion. We design an end-to-end method that does not rely on any intermediate representations and learns to directly map from images and headset poses to humanoid control signals. To train our method, we also propose a large-scale synthetic dataset created using camera configurations compatible with a commercially available VR headset (Quest 2) and show promising results on real-world captures. To demonstrate the applicability of our framework, we also test it on an AR headset with a forward-facing camera.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (71)
  1. Introducing project aria, from meta. https://www.projectaria.com/, a. Accessed: 2023-10-17.
  2. Meta quest 2: Immersive all-in-one VR headset. https://www.meta.com/ie/quest/products/quest-2/, b. Accessed: 2023-11-16.
  3. Apple vision pro. https://www.apple.com/apple-vision-pro/, c. Accessed: 2023-10-19.
  4. Meta quest 3: New mixed reality VR headset – shop now. https://www.meta.com/ie/quest/quest-3/, d. Accessed: 2023-10-17.
  5. Unrealego: A new dataset for robust egocentric 3d human motion capture. arXiv preprint arXiv:2208.01633, 2022.
  6. Trajectory optimization for full-body movements with complex contacts.
  7. Flag: Flow-based 3d avatar generation from sparse observations. arXiv preprint arXiv:2203.05789, 2022.
  8. Hmd-nemo: Online 3d avatar motion generation from sparse observations. arXiv preprint arXiv:2308.11261, 2023.
  9. Beta Program. Unity real-time development platform. https://unity.com/. Accessed: 2023-11-18.
  10. Keep it smpl: Automatic estimation of 3d human pose and shape from a single image. Lect. Notes Comput. Sci., 9909 LNCS:561–578, 2016.
  11. Imitate and repurpose: Learning reusable robot movement skills from human and animal behaviors. arXiv preprint arXiv:2203.17138, 2022.
  12. Physics-based motion capture imitation with deep reinforcement learning. Proceedings - MIG 2018: ACM SIGGRAPH Conference on Motion, Interaction, and Games, 2018.
  13. Learning phrase representations using rnn encoder–decoder for statistical machine translation. In Conference on Empirical Methods in Natural Language Processing, 2014.
  14. Yuming Du. Avatars grow legs: Generating smooth human motion from sparse tracking inputs with diffusion model.
  15. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. arXiv preprint arXiv:1702.03118, 2017.
  16. Supertrack: motion tracking for physically simulated characters using supervised learning. ACM Trans. Graph., 40:1–13, 2021.
  17. Bottom-up human pose estimation via disentangled keypoint regression. arXiv preprint arXiv:2104.02300, 2021.
  18. Posetriplet: Co-evolving 3d human pose estimation, imitation, and hallucination under self-supervision. CVPR, 2022.
  19. Trajectory optimization for physics-based reconstruction of 3d human pose from monocular video. arXiv preprint arXiv:2205.12292, 2022.
  20. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385, 2015.
  21. Learning human-to-humanoid real-time whole-body teleoperation, 2024.
  22. Neural mocon: Neural motion control for physically plausible human motion capture. arXiv preprint arXiv:2203.14065, 2022.
  23. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015.
  24. Seeing invisible poses: Estimating 3d body pose from egocentric video. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3501–3509. IEEE, 2017.
  25. Avatarposer: Articulated full-body pose tracking from sparse motion sensing. arXiv preprint arXiv:2207.13784, 2022.
  26. Vibe: Video inference for human body pose and shape estimation. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 5252–5262, 2020.
  27. Questenvsim: Environment-aware simulated motion tracking from sparse sensors. arXiv preprint arXiv:2306.05666, 2023.
  28. Ego-body pose estimation via ego-head pose estimation. arXiv preprint arXiv:2212.04636, 2022.
  29. Smpl: A skinned multi-person linear model. ACM Trans. Graph., 34, 2015.
  30. Dynamics-regulated kinematic policy for egocentric pose estimation. NeurIPS, 34:25019–25032, 2021.
  31. Embodied scene-aware human pose estimation. NeurIPS, 2022.
  32. Universal humanoid motion representations for physics-based control. arXiv preprint arXiv:2310.04582, 2023a.
  33. Perpetual humanoid control for real-time simulated avatars. In International Conference on Computer Vision (ICCV), 2023b.
  34. Amass: Archive of motion capture as surface shapes. Proceedings of the IEEE International Conference on Computer Vision, 2019-Octob:5441–5450, 2019.
  35. Isaac gym: High performance gpu-based physics simulation for robot learning. arXiv preprint arXiv:2108.10470, 2021.
  36. Catch and carry: Reusable neural controllers for vision-guided whole-body tasks. ACM Trans. Graph., 39, 2020.
  37. Aria digital twin: A new benchmark dataset for egocentric 3d machine perception. arXiv preprint arXiv:2306.06362, 2023.
  38. Expressive body capture: 3d hands, face, and body from a single image. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2019-June:10967–10977, 2019.
  39. Deepmimic. ACM Trans. Graph., 37:1–14, 2018.
  40. Mcp: Learning composable hierarchical control with multiplicative compositional policies. arXiv preprint arXiv:1905.09808, 2019.
  41. Combining motion matching and orientation prediction to animate avatars for consumer-grade vr devices. In Computer Graphics Forum, pages 107–118. Wiley Online Library, 2022.
  42. Contact and human dynamics from monocular video. Lect. Notes Comput. Sci., 12350 LNCS:71–87, 2020.
  43. Diffmimic: Efficient motion mimicking with differentiable physics. arXiv preprint arXiv:2304.03274, 2023.
  44. Egocap: egocentric marker-less motion capture with two fisheye cameras. ACM Transactions on Graphics (TOG), 35(6):1–11, 2016.
  45. A reduction of imitation learning and structured prediction to no-regret online learning. arXiv preprint arXiv:1011.0686, 2010.
  46. Policy distillation. arXiv preprint arXiv:1511.06295, 2015.
  47. Convolutional, long short-term memory, fully connected deep neural networks. 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4580–4584, 2015.
  48. Kickstarting deep reinforcement learning. arXiv preprint arXiv:1803.03835, 2018.
  49. Proximal policy optimization algorithms, 2017.
  50. Physcap: Physically plausible monocular 3d motion capture in real time. arXiv preprint arXiv:2008.08880, 2020.
  51. Neural monocular 3d human motion capture with physical awareness. arXiv preprint arXiv:2105.01057, 2021.
  52. Project aria: A new tool for egocentric multi-modal ai research. ArXiv, abs/2308.13561, 2023.
  53. Learning 3d human pose estimation from dozens of datasets using a geometry-aware autoencoder to bridge between skeleton formats. arXiv preprint arXiv:2212.14474, 2022.
  54. Selfpose: 3d egocentric pose estimation from a headset mounted camera. arXiv preprint arXiv:2011.01519, 2020.
  55. Estimating egocentric 3d human pose in global space. arXiv preprint arXiv:2104.13454, 2021.
  56. Scene-aware egocentric 3d human pose estimation. arXiv preprint arXiv:2212.11684, 2022.
  57. Learning human dynamics in autonomous driving scenarios. International Conference on Computer Vision, 2023, 2023.
  58. Unicon: Universal neural controller for physics-based character motion. 2020.
  59. Questsim: Human motion tracking from sparse sensors with simulated avatars. arXiv preprint arXiv:2209.09391, 2022.
  60. A scalable approach to control diverse behaviors for physically simulated characters. ACM Trans. Graph., 39, 2020.
  61. Physics-based character controllers using conditional vaes. ACM Trans. Graph., 41:1–12, 2022.
  62. Group Normalization, pages 3–19. Springer International Publishing, 2018.
  63. Physics-based human motion estimation and synthesis from videos. arXiv preprint arXiv:2109.09913, 2021.
  64. Mo2cap2: Real-time mobile 3d motion capture with a cap-mounted fisheye camera. arXiv preprint arXiv:1803.05959, 2018.
  65. 3d ego-pose estimation via imitation learning. In Computer Vision – ECCV 2018, pages 763–778. Springer International Publishing, 2018.
  66. Ego-pose estimation and forecasting as real-time pd control. Proceedings of the IEEE International Conference on Computer Vision, 2019-Octob:10081–10091, 2019.
  67. Residual force control for agile human behavior imitation and extended motion synthesis. arXiv preprint arXiv:2006.07364, 2020.
  68. Simpoe: Simulated character control for 3d human pose estimation. CVPR, 2021.
  69. Egoglass: Egocentric-view human pose estimation from an eyeglass frame, 2021.
  70. On the continuity of rotation representations in neural networks. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2019-June:5738–5746, 2019.
  71. Robot parkour learning. arXiv preprint arXiv:2309.05665, 2023.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets