Mocap Everyone Everywhere: Lightweight Motion Capture With Smartwatches and a Head-Mounted Camera (2401.00847v2)
Abstract: We present a lightweight and affordable motion capture method based on two smartwatches and a head-mounted camera. In contrast to the existing approaches that use six or more expert-level IMU devices, our approach is much more cost-effective and convenient. Our method can make wearable motion capture accessible to everyone everywhere, enabling 3D full-body motion capture in diverse environments. As a key idea to overcome the extreme sparsity and ambiguities of sensor inputs with different modalities, we integrate 6D head poses obtained from the head-mounted cameras for motion estimation. To enable capture in expansive indoor and outdoor scenes, we propose an algorithm to track and update floor level changes to define head poses, coupled with a multi-stage Transformer-based regression module. We also introduce novel strategies leveraging visual cues of egocentric images to further enhance the motion capture quality while reducing ambiguities. We demonstrate the performance of our method on various challenging scenarios, including complex outdoor environments and everyday motions including object interactions and social interactions among multiple individuals.
- Apple watch se. https://www.apple.com/apple-watch-se/.
- https://developer.apple.com/documentation/coremotion.
- Meta quest vr headsets. https://www.meta.com/quest/.
- Open gopro python sdk. https://gopro.github.io/OpenGoPro/demos/python/sdk_wireless_camera_control.
- Project aria. https://www.projectaria.com/.
- Ray-ban meta smart glasses. https://www.meta.com/smart-glasses.
- Rokoko smartsuit pro. https://www.rokoko.com/products/smartsuit-pro.
- Sensorlog: Log and stream sensor data. https://sensorlog.berndthomas.net/.
- Vicon motion capture system. https://www.vicon.com/motion-capture.
- Xsens mvn link. https://www.movella.com/products/motion-capture/xsens-mvn-link.
- Coolmoves: User motion accentuation in virtual reality. ACM IMWUT, 2021.
- Flag: Flow-based 3d avatar generation from sparse observations. In CVPR, 2022.
- Hmd-nemo: Online 3d avatar motion generation from sparse observations. In ICCV, 2023.
- Mobile. egocentric human body motion reconstruction using only eyeglasses-mounted cameras and a few body-worn inertial sensors. In IEEE VR, 2021.
- Human4d: A human-centric multimodal dataset for motions and immersive media. IEEE Access, 8, 2020.
- Learning-rate-free learning by d-adaptation. arXiv preprint arXiv:2301.07733, 2023.
- Full-body motion from a single head-mounted device: Generating smpl poses from partial observations. In ICCV, 2021.
- Avatars grow legs: Generating smooth human motion from sparse tracking inputs with diffusion model. In CVPR, 2023.
- MoVi: A Large Multipurpose Motion and Video Dataset, 2020.
- Fusing visual and inertial sensors with semantics for 3d human pose estimation. IJCV, 127, 2019.
- Humans in 4d: Reconstructing and tracking humans with transformers. In ICCV, 2023.
- fairmotion - tools to load, process and visualize motion capture data. Github, 2020.
- Human poseitioning system (hps): 3d human pose estimation and self-localization in large scenes from body-mounted sensors. In CVPR, 2021.
- Robust motion in-betweening. ACM TOG, 39(4), 2020.
- Real-time body tracking with one depth camera and inertial sensors. In ICCV, 2013.
- A deep learning framework for character motion synthesis and editing. ACM TOG, 35(4), 2016.
- Deep inertial poser learning to reconstruct human pose from sparse inertial measurements in real time. ACM TOG, 37(6), 2018.
- Seeing invisible poses: Estimating 3d body pose from egocentric video. In CVPR, 2017.
- Egocentric pose estimation from human vision span. In ICCV, 2021.
- Avatarposer: Articulated full-body pose tracking from sparse motion sensing. In ECCV, 2022.
- Transformer inertial poser: Real-time human motion reconstruction from sparse imus with simultaneous terrain generation. In SIGGRAPH Asia, 2022.
- Panoptic studio: A massively multiview system for social motion capture. In ICCV, 2015.
- Resolving position ambiguity of imu-based human pose with a single rgb camera. Sensors, 20(19), 2020.
- Motion capturing with inertial measurement units and kinect. In BIOSTEC, 2014.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Locomotion-action-manipulation: Synthesizing human-scene interactions in complex 3d environments. In ICCV, 2023.
- Questenvsim: Environment-aware simulated motion tracking from sparse sensors. In SIGGRAPH, 2023.
- Ego-body pose estimation via ego-head pose estimation. In CVPR, 2023.
- Markerless motion capture of multiple characters using multiview image segmentation. IEEE TPAMI, 35(11), 2013.
- Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
- Mediapipe: A framework for building perception pipelines. arXiv preprint arXiv:1906.08172, 2019.
- Dynamics-regulated kinematic policy for egocentric pose estimation. 2021.
- Amass: Archive of motion capture as surface shapes. In ICCV, 2019.
- Real-time multi-person motion capture from multi-view video and imus. IJCV, 128, 2020.
- Real-time full-body motion capture from video and imus. In 3DV, 2017.
- Unifying representations and large-scale whole-body motion databases for studying human motion. T-RO, 32(4), 2016.
- Catch & carry: reusable neural controllers for vision-guided whole-body tasks. ACM TOG, 39(4), 2020.
- Documentation mocap database hdm05. Computer Graphics Technical Report CG-2007-2, Universität Bonn, 2007.
- You2me: Inferring body pose in egocentric video via first and second person interactions. In CVPR, 2020.
- Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32. 2019.
- Multisensor-fusion for 3d full-body human motion capture. In CVPR, 2010.
- Motion capture from body-mounted cameras. In SIGGRAPH, 2011.
- DROID-SLAM: Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras. In NeurIPS, 2021.
- Selfpose: 3d egocentric pose estimation from a headset mounted camera. IEEE TPAMI, 2020.
- xr-egopose: Egocentric 3d human pose from an hmd camera. In ICCV, 2019.
- Total capture: 3d human pose estimation fusing video and inertial sensors. In BMVC, 2017.
- Carnegie Mellon University. Cmu mocap dataset.
- Recovering accurate 3d human pose in the wild using imus and a moving camera. In ECCV, 2018.
- Human pose estimation from video and imus. IEEE TPAMI, 38(8), 2016.
- Sparse inertial poser: Automatic 3d human pose estimation from sparse imus. In Comput. Graph. Forum, 2017.
- Estimating egocentric 3d human pose in global space. In ICCV, 2021.
- Scene-aware egocentric 3d human pose estimation. In CVPR, 2023.
- Questsim: Human motion tracking from sparse sensors with simulated avatars. In SIGGRAPH Asia, 2022.
- Mo 2 cap 2: Real-time mobile 3d motion capture with a cap-mounted fisheye camera. IEEE TVCG, 25(5), 2019.
- Lobstr: Real-time lower-body pose prediction from sparse upper-body tracking signals. In Comput. Graph. Forum, 2021.
- Egolocate: Real-time motion capture, localization, and mapping with sparse body-mounted sensors. ACM TOG, 42(4), 2023.
- Physical inertial poser (pip): Physics-aware real-time human motion tracking from sparse inertial sensors. In CVPR, 2022.
- Transpose: Real-time 3d human translation and pose estimation with six inertial sensors. ACM TOG, 40(4), 2021.
- 3d ego-pose estimation via imitation learning. In ECCV, 2018.
- Ego-pose estimation and forecasting as real-time pd control. In ICCV, 2019.
- Lightweight multi-person total motion capture using sparse multi-view cameras. In ICCV, 2021.
- On the continuity of rotation representations in neural networks. In CVPR, 2019.