DivaTrack: Diverse Bodies and Motions from Acceleration-Enhanced Three-Point Trackers (2402.09211v1)
Abstract: Full-body avatar presence is crucial for immersive social and environmental interactions in digital reality. However, current devices only provide three six degrees of freedom (DOF) poses from the headset and two controllers (i.e. three-point trackers). Because it is a highly under-constrained problem, inferring full-body pose from these inputs is challenging, especially when supporting the full range of body proportions and use cases represented by the general population. In this paper, we propose a deep learning framework, DivaTrack, which outperforms existing methods when applied to diverse body sizes and activities. We augment the sparse three-point inputs with linear accelerations from Inertial Measurement Units (IMU) to improve foot contact prediction. We then condition the otherwise ambiguous lower-body pose with the predictions of foot contact and upper-body pose in a two-stage model. We further stabilize the inferred full-body pose in a wide range of configurations by learning to blend predictions that are computed in two reference frames, each of which is designed for different types of motions. We demonstrate the effectiveness of our design on a large dataset that captures 22 subjects performing challenging locomotion for three-point tracking, including lunges, hula-hooping, and sitting. As shown in a live demo using the Meta VR headset and Xsens IMUs, our method runs in real-time while accurately tracking a user's motion when they perform a diverse set of movements.
- “FLAG: Flow-based 3D Avatar Generation from Sparse Observations” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
- “Coolmoves: User motion accentuation in virtual reality” In Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5.2 ACM New York, NY, USA, 2021, pp. 1–23
- “ControllerPose: Inside-Out Body Capture with VR Controller Cameras” In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, CHI ’22 New Orleans, LA, USA: Association for Computing Machinery, 2022
- “UnrealEgo: A New Dataset for Robust Egocentric 3D Human Motion Capture” In European Conference on Computer Vision (ECCV), 2022
- “Empirical evaluation of gated recurrent neural networks on sequence modeling” In arXiv preprint arXiv:1412.3555, 2014
- Jinxiang Chai and Jessica K Hodgins “Performance animation from low-dimensional control signals” In ACM SIGGRAPH 2005 Papers, 2005, pp. 686–696
- “Estimating running spatial and temporal parameters using an inertial sensor” In Sports Engineering 21 Springer, 2018, pp. 115–122
- “SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation”, 2023 arXiv:2309.17448 [cs.CV]
- “Low-pass filter cutoff frequency affects sacral-mounted inertial measurement unit estimations of peak vertical ground reaction force and contact time during treadmill running” In Journal of Biomechanics 119 Elsevier, 2021, pp. 110323
- “Full-Body Motion from a Single Head-Mounted Device: Generating SMPL Poses from Partial Observations” In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 11687–11697
- “Avatars grow legs: Generating smooth human motion from sparse tracking inputs with diffusion model” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 481–490
- “MoFusion: A Framework for Denoising-Diffusion-based Motion Synthesis” In Computer Vision and Pattern Recognition (CVPR), 2023
- “SlowFast Networks for Video Recognition” In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019
- Ylva Ferstl, Michael Neff and Rachel McDonnell “Multi-Objective Adversarial Gesture Generation” In Proceedings of the 12th ACM SIGGRAPH Conference on Motion, Interaction and Games, MIG ’19 Newcastle upon Tyne, United Kingdom: Association for Computing Machinery, 2019
- “Foot strike pattern differently affects the axial and transverse components of shock acceleration and attenuation in downhill trail running” In Journal of biomechanics 49.9 Elsevier, 2016, pp. 1765–1771
- “Style-based inverse kinematics” In ACM SIGGRAPH 2004 Papers, 2004, pp. 522–531
- “Human poseitioning system (hps): 3d human pose estimation and self-localization in large scenes from body-mounted sensors” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 4318–4329
- “Generative Adversarial Nets” In Advances in Neural Information Processing Systems 27 Curran Associates, Inc., 2014
- “Humans in 4D: Reconstructing and Tracking Humans with Transformers” In International Conference on Computer Vision (ICCV), 2023
- Gustav Eje Henter, Simon Alexanderson and Jonas Beskow “Moglow: Probabilistic and controllable motion synthesis using normalising flows” In ACM Transactions on Graphics (TOG) 39.6 ACM New York, NY, USA, 2020, pp. 1–14
- “Deep inertial poser: learning to reconstruct human pose from sparse inertial measurements in real time” In ACM Transactions on Graphics (TOG) 37.6 ACM New York, NY, USA, 2018, pp. 1–15
- “Long short-term memory” In Neural computation 9.8 MIT Press, 1997, pp. 1735–1780
- “Learning motion manifolds with convolutional autoencoders” In SIGGRAPH Asia 2015 technical briefs, 2015, pp. 1–4
- “EgoPoser: Robust Real-Time Ego-Body Pose Estimation in Large Scenes” In arXiv preprint arXiv:2308.06493, 2023
- “Avatarposer: Articulated full-body pose tracking from sparse motion sensing” In European Conference on Computer Vision, 2022, pp. 443–460 Springer
- “Transformer Inertial Poser: Real-time Human Motion Reconstruction from Sparse IMUs with Simultaneous Terrain Generation” In SIGGRAPH Asia 2022 Conference Papers, 2022, pp. 1–9
- “EgoHumans: An Egocentric 3D Multi-Human Benchmark” In International Conference on Computer Vision (ICCV), 2023
- Diederik P. Kingma and Max Welling “Auto-Encoding Variational Bayes” In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, 2014 arXiv:http://arxiv.org/abs/1312.6114v10 [stat.ML]
- “Learning 3d human dynamics from video” In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 5614–5623
- “GANimator: Neural Motion Synthesis from a Single Sequence” In ACM Trans. Graph. 41.4 New York, NY, USA: Association for Computing Machinery, 2022
- “Scene-Aware 3D Multi-Human Motion Capture from a Single Camera” In Computer Graphics Forum 42.2, 2023, pp. 371–383 DOI: https://doi.org/10.1111/cgf.14768
- “Dynamics-regulated kinematic policy for egocentric pose estimation” In Advances in Neural Information Processing Systems 34, 2021, pp. 25019–25032
- “BodyTrak: Inferring Full-Body Poses from Body Silhouettes Using a Miniature Camera on a Wristband” In Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 6.3 New York, NY, USA: Association for Computing Machinery, 2022
- Jiaman Li, Karen Liu and Jiajun Wu “Ego-Body Pose Estimation via Ego-Head Pose Estimation” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 17142–17151
- James B Lee, Rebecca B Mellifont and Brendan J Burkett “The use of a single inertial sensor to identify stride, step, and stance durations of running gait” In Journal of Science and Medicine in Sport 13.2 Elsevier, 2010, pp. 270–273
- “QuestEnvSim: Environemnt-aware Simulated Motion Tracking From Sparse Sensors” In SIGGRAPH Conference, 2023
- “Continuous character control with low-dimensional embeddings” In ACM Transactions on Graphics (TOG) 31.4 ACM New York, NY, USA, 2012, pp. 1–10
- “Learning to Generate Diverse Dance Motions with Transformer”, 2020
- “Dancing to Music” In Advances in Neural Information Processing Systems 32 Curran Associates, Inc., 2019 URL: https://proceedings.neurips.cc/paper/2019/file/7ca57a9f85a19a6e4b9a248c1daca185-Paper.pdf
- “Character controllers using motion vaes” In ACM Transactions on Graphics (TOG) 39.4 ACM New York, NY, USA, 2020, pp. 40–1
- “Human motion estimation from a reduced marker set” In Proceedings of the 2006 symposium on Interactive 3D graphics and games, 2006, pp. 35–42
- “IMUPoser: Full-Body Pose Estimation Using IMUs in Phones, Watches, and Earbuds” In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, CHI ’23 Hamburg, Germany: Association for Computing Machinery, 2023 DOI: 10.1145/3544548.3581392
- “Motion Graphs++: A Compact Generative Model for Semantic Motion Analysis and Synthesis” In ACM Trans. Graph. 31.6 New York, NY, USA: Association for Computing Machinery, 2012
- “AMASS: Archive of Motion Capture as Surface Shapes” In International Conference on Computer Vision, 2019, pp. 5442–5451
- Rolf Moe-Nilssen “A new method for evaluating motor control in gait under real-life environmental conditions. Part 1: The instrument” In Clinical biomechanics 13.4-5 Elsevier, 1998, pp. 320–327
- Nicholas Milef, Shinjiro Sueda and Nima Khademi Kalantari “Variational Pose Prediction with Dynamic Sample Selection from Sparse Tracking Signals” In Computer Graphics Forum The Eurographics AssociationJohn Wiley & Sons Ltd., 2023 DOI: 10.1111/cgf.14767
- “Towards Robust Direction Invariance in Character Animation” In Computer Graphics Forum 38.7, 2019, pp. 235–242
- “An RNN-Ensemble Approach for Real Time Human Pose Estimation from Sparse IMUs” In Proceedings of the 3rd International Conference on Applications of Intelligent Systems, APPIS 2020 Las Palmas de Gran Canaria, Spain: Association for Computing Machinery, 2020
- Mathis Petrovich, Michael J Black and Gül Varol “Action-conditioned 3d human motion synthesis with transformer vae” In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10985–10995
- “A single sacral-mounted inertial measurement unit to estimate peak vertical ground reaction force, contact time, and flight time in running” In Sensors 22.3 MDPI, 2022, pp. 784
- “Combining Motion Matching and Orientation Prediction to Animate Avatars for Consumer-Grade VR Devices” In Computer Graphics Forum The Eurographics AssociationJohn Wiley & Sons Ltd., 2022
- Daniel Roetenberg, Henk Luinge and Per Johan Slycke “Xsens MVN: Full 6DOF Human Motion Tracking Using Miniature Inertial Sensors”, 2008
- Yu Rong, Takaaki Shiratori and Hanbyul Joo “Frankmocap: A monocular 3d whole-body pose estimation system via regression and integration” In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 1749–1759
- “Physcap: Physically plausible monocular 3d motion capture in real time” In ACM Transactions on Graphics (ToG) 39.6 ACM New York, NY, USA, 2020, pp. 1–16
- Alla Safonova, Jessica K Hodgins and Nancy S Pollard “Synthesizing physically realistic human motion in low-dimensional, behavior-specific spaces” In ACM Transactions on Graphics (ToG) 23.3 ACM New York, NY, USA, 2004, pp. 514–521
- Kihyuk Sohn, Honglak Lee and Xinchen Yan “Learning structured output representation using deep conditional generative models” In Advances in neural information processing systems 28, 2015
- Sebastian Starke, Ian Mason and Taku Komura “DeepPhase: periodic autoencoders for learning motion phase manifolds” In ACM Transactions on Graphics (TOG) 41.4 ACM New York, NY, USA, 2022, pp. 1–13
- Mike Schuster and Kuldip K. Paliwal “Bidirectional recurrent neural networks” In IEEE Transactions on Signal Processing 45, 1997, pp. 2673–2681
- “Human motion diffusion as a generative prior” In arXiv preprint arXiv:2303.01418, 2023
- “Neural state machine for character-scene interactions.” In ACM Trans. Graph. 38.6, 2019, pp. 209–1
- “Local Motion Phases for Learning Multi-Contact Character Movements” In ACM Trans. Graph. 39.4 New York, NY, USA: Association for Computing Machinery, 2020
- “Pose-ndf: Modeling human pose manifolds with neural distance fields” In European Conference on Computer Vision, 2022, pp. 572–589 Springer
- “Selfpose: 3d egocentric pose estimation from a headset mounted camera” In Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2020
- Jonathan Tseng, Rodrigo Castellon and C Karen Liu “EDGE: Editable Dance Generation From Music” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
- “Total Capture: 3D Human Pose Estimation Fusing Video and Inertial Sensors” In British Machine Vision Conference 2017, BMVC 2017, London, UK, September 4-7, 2017 BMVA Press, 2017
- “Human Motion Diffusion Model” In ICLR, 2023
- “Sparse Inertial Poser: Automatic 3D Human Pose Estimation from Sparse IMUs” In Computer Graphics Forum, the 38th Annual Conference of the European Association for Computer Graphics 36.2, 2017, pp. 349–360
- “Attention is all you need” In Advances in neural information processing systems 30, 2017
- Zhiyong Wang, Jinxiang Chai and Shihong Xia “Combining Recurrent Neural Networks and Adversarial Training for Human Motion Synthesis and Control” In IEEE Transactions on Visualization and Computer Graphics 27.1 USA: IEEE Educational Activities Department, 2021, pp. 14–28
- Jack M. Wang, David J. Fleet and Aaron Hertzmann “Gaussian Process Dynamical Models for Human Motion” In IEEE Transactions on Pattern Analysis and Machine Intelligence 30.2, 2008, pp. 283–298 DOI: 10.1109/TPAMI.2007.1167
- “Scene-aware Egocentric 3D Human Pose Estimation” In CVPR, 2023
- Alexander Winkler, Jungdam Won and Yuting Ye “QuestSim: Human Motion Tracking from Sparse Sensors with Simulated Avatars” In SIGGRAPH Asia 2022 Conference Papers, 2022, pp. 1–8
- Dongseok Yang, Doyeon Kim and Sung-Hee Lee “Lobstr: Real-time lower-body pose prediction from sparse upper-body tracking signals” In Computer Graphics Forum 40.2, 2021, pp. 265–275 Wiley Online Library
- “Neural3Points: Learning to Generate Physically Realistic Full-body Motion for Virtual Reality Users” In Computer Graphics Forum 41.8, 2022, pp. 183–194
- “Decoupling Human and Camera Motion from Videos in the Wild” In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023
- “Physical Inertial Poser (PIP): Physics-aware Real-time Human Motion Tracking from Sparse Inertial Sensors” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 13167–13178
- “EgoLocate: Real-time Motion Capture, Localization, and Mapping with Sparse Body-mounted Sensors” In ACM Transactions on Graphics (TOG) 42.4 ACM, 2023
- Xinyu Yi, Yuxiao Zhou and Feng Xu “TransPose: real-time 3D human translation and pose estimation with six inertial sensors” In ACM Transactions on Graphics (TOG) 40.4 ACM New York, NY, USA, 2021, pp. 1–13
- “On the Continuity of Rotation Representations in Neural Networks” In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019
- “MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model” In arXiv preprint arXiv:2208.15001, 2022
- Dongseok Yang (7 papers)
- Jiho Kang (3 papers)
- Lingni Ma (19 papers)
- Joseph Greer (1 paper)
- Yuting Ye (38 papers)
- Sung-Hee Lee (15 papers)