Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Mocap Everyone Everywhere: Lightweight Motion Capture With Smartwatches and a Head-Mounted Camera (2401.00847v2)

Published 1 Jan 2024 in cs.CV and cs.GR

Abstract: We present a lightweight and affordable motion capture method based on two smartwatches and a head-mounted camera. In contrast to the existing approaches that use six or more expert-level IMU devices, our approach is much more cost-effective and convenient. Our method can make wearable motion capture accessible to everyone everywhere, enabling 3D full-body motion capture in diverse environments. As a key idea to overcome the extreme sparsity and ambiguities of sensor inputs with different modalities, we integrate 6D head poses obtained from the head-mounted cameras for motion estimation. To enable capture in expansive indoor and outdoor scenes, we propose an algorithm to track and update floor level changes to define head poses, coupled with a multi-stage Transformer-based regression module. We also introduce novel strategies leveraging visual cues of egocentric images to further enhance the motion capture quality while reducing ambiguities. We demonstrate the performance of our method on various challenging scenarios, including complex outdoor environments and everyday motions including object interactions and social interactions among multiple individuals.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (72)
  1. Apple watch se. https://www.apple.com/apple-watch-se/.
  2. https://developer.apple.com/documentation/coremotion.
  3. Meta quest vr headsets. https://www.meta.com/quest/.
  4. Open gopro python sdk. https://gopro.github.io/OpenGoPro/demos/python/sdk_wireless_camera_control.
  5. Project aria. https://www.projectaria.com/.
  6. Ray-ban meta smart glasses. https://www.meta.com/smart-glasses.
  7. Rokoko smartsuit pro. https://www.rokoko.com/products/smartsuit-pro.
  8. Sensorlog: Log and stream sensor data. https://sensorlog.berndthomas.net/.
  9. Vicon motion capture system. https://www.vicon.com/motion-capture.
  10. Xsens mvn link. https://www.movella.com/products/motion-capture/xsens-mvn-link.
  11. Coolmoves: User motion accentuation in virtual reality. ACM IMWUT, 2021.
  12. Flag: Flow-based 3d avatar generation from sparse observations. In CVPR, 2022.
  13. Hmd-nemo: Online 3d avatar motion generation from sparse observations. In ICCV, 2023.
  14. Mobile. egocentric human body motion reconstruction using only eyeglasses-mounted cameras and a few body-worn inertial sensors. In IEEE VR, 2021.
  15. Human4d: A human-centric multimodal dataset for motions and immersive media. IEEE Access, 8, 2020.
  16. Learning-rate-free learning by d-adaptation. arXiv preprint arXiv:2301.07733, 2023.
  17. Full-body motion from a single head-mounted device: Generating smpl poses from partial observations. In ICCV, 2021.
  18. Avatars grow legs: Generating smooth human motion from sparse tracking inputs with diffusion model. In CVPR, 2023.
  19. MoVi: A Large Multipurpose Motion and Video Dataset, 2020.
  20. Fusing visual and inertial sensors with semantics for 3d human pose estimation. IJCV, 127, 2019.
  21. Humans in 4d: Reconstructing and tracking humans with transformers. In ICCV, 2023.
  22. fairmotion - tools to load, process and visualize motion capture data. Github, 2020.
  23. Human poseitioning system (hps): 3d human pose estimation and self-localization in large scenes from body-mounted sensors. In CVPR, 2021.
  24. Robust motion in-betweening. ACM TOG, 39(4), 2020.
  25. Real-time body tracking with one depth camera and inertial sensors. In ICCV, 2013.
  26. A deep learning framework for character motion synthesis and editing. ACM TOG, 35(4), 2016.
  27. Deep inertial poser learning to reconstruct human pose from sparse inertial measurements in real time. ACM TOG, 37(6), 2018.
  28. Seeing invisible poses: Estimating 3d body pose from egocentric video. In CVPR, 2017.
  29. Egocentric pose estimation from human vision span. In ICCV, 2021.
  30. Avatarposer: Articulated full-body pose tracking from sparse motion sensing. In ECCV, 2022.
  31. Transformer inertial poser: Real-time human motion reconstruction from sparse imus with simultaneous terrain generation. In SIGGRAPH Asia, 2022.
  32. Panoptic studio: A massively multiview system for social motion capture. In ICCV, 2015.
  33. Resolving position ambiguity of imu-based human pose with a single rgb camera. Sensors, 20(19), 2020.
  34. Motion capturing with inertial measurement units and kinect. In BIOSTEC, 2014.
  35. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  36. Locomotion-action-manipulation: Synthesizing human-scene interactions in complex 3d environments. In ICCV, 2023.
  37. Questenvsim: Environment-aware simulated motion tracking from sparse sensors. In SIGGRAPH, 2023.
  38. Ego-body pose estimation via ego-head pose estimation. In CVPR, 2023.
  39. Markerless motion capture of multiple characters using multiview image segmentation. IEEE TPAMI, 35(11), 2013.
  40. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  41. Mediapipe: A framework for building perception pipelines. arXiv preprint arXiv:1906.08172, 2019.
  42. Dynamics-regulated kinematic policy for egocentric pose estimation. 2021.
  43. Amass: Archive of motion capture as surface shapes. In ICCV, 2019.
  44. Real-time multi-person motion capture from multi-view video and imus. IJCV, 128, 2020.
  45. Real-time full-body motion capture from video and imus. In 3DV, 2017.
  46. Unifying representations and large-scale whole-body motion databases for studying human motion. T-RO, 32(4), 2016.
  47. Catch & carry: reusable neural controllers for vision-guided whole-body tasks. ACM TOG, 39(4), 2020.
  48. Documentation mocap database hdm05. Computer Graphics Technical Report CG-2007-2, Universität Bonn, 2007.
  49. You2me: Inferring body pose in egocentric video via first and second person interactions. In CVPR, 2020.
  50. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32. 2019.
  51. Multisensor-fusion for 3d full-body human motion capture. In CVPR, 2010.
  52. Motion capture from body-mounted cameras. In SIGGRAPH, 2011.
  53. DROID-SLAM: Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras. In NeurIPS, 2021.
  54. Selfpose: 3d egocentric pose estimation from a headset mounted camera. IEEE TPAMI, 2020.
  55. xr-egopose: Egocentric 3d human pose from an hmd camera. In ICCV, 2019.
  56. Total capture: 3d human pose estimation fusing video and inertial sensors. In BMVC, 2017.
  57. Carnegie Mellon University. Cmu mocap dataset.
  58. Recovering accurate 3d human pose in the wild using imus and a moving camera. In ECCV, 2018.
  59. Human pose estimation from video and imus. IEEE TPAMI, 38(8), 2016.
  60. Sparse inertial poser: Automatic 3d human pose estimation from sparse imus. In Comput. Graph. Forum, 2017.
  61. Estimating egocentric 3d human pose in global space. In ICCV, 2021.
  62. Scene-aware egocentric 3d human pose estimation. In CVPR, 2023.
  63. Questsim: Human motion tracking from sparse sensors with simulated avatars. In SIGGRAPH Asia, 2022.
  64. Mo 2 cap 2: Real-time mobile 3d motion capture with a cap-mounted fisheye camera. IEEE TVCG, 25(5), 2019.
  65. Lobstr: Real-time lower-body pose prediction from sparse upper-body tracking signals. In Comput. Graph. Forum, 2021.
  66. Egolocate: Real-time motion capture, localization, and mapping with sparse body-mounted sensors. ACM TOG, 42(4), 2023.
  67. Physical inertial poser (pip): Physics-aware real-time human motion tracking from sparse inertial sensors. In CVPR, 2022.
  68. Transpose: Real-time 3d human translation and pose estimation with six inertial sensors. ACM TOG, 40(4), 2021.
  69. 3d ego-pose estimation via imitation learning. In ECCV, 2018.
  70. Ego-pose estimation and forecasting as real-time pd control. In ICCV, 2019.
  71. Lightweight multi-person total motion capture using sparse multi-view cameras. In ICCV, 2021.
  72. On the continuity of rotation representations in neural networks. In CVPR, 2019.
Citations (9)

Summary

  • The paper introduces a low-cost method for full-body motion capture using a head-mounted camera and two smartwatches to reduce sensor requirements.
  • It leverages a multi-stage Transformer-based algorithm and floor-level calibration to achieve accurate motion capture on uneven terrains.
  • The approach democratizes motion capture technology, enabling affordable, high-quality data collection for research in sports, healthcare, and interactive media.

Introduction to Motion Capture Technology

Motion capture technology is vital for replicating the intricacies of human movement in virtual environments, films, and interactive systems. The traditional way of capturing motion often necessitates a significant number of sensors and a controlled environment, which limits accessibility and convenience. Moreover, acquiring comprehensive motion capture data that accurately reflects varied real-world scenarios can be challenging due to the need for expert equipment and complex settings, leaving researchers with relatively limited data compared to other domains like imagery and language.

Democratization of Motion Capture

Aiming to make motion capture more accessible, this work proposes a novel method that requires only a head-mounted camera and two smartwatches. This setup dramatically reduces the cost and complexity of capturing motion and does not require the individual to be in a specific location. With the advent of smartwatches and wearable cameras, motion capture can now be conducted indoors or outdoors, capturing a wide variety of movements, from daily interactions to various outdoor activities.

Enhancing Motion Capture with Smart Technology

To cope with the sparsity of data that comes from using only two smartwatches, the system includes an algorithm that updates floor levels to calibrate head poses. This ensures accurate motion capture even on uneven terrains like stairs or hills. A multi-stage Transformer-based module processes the sensor data, enabling the precise estimation of movement.

Additionally, the method uses visual cues from the head-mounted camera to resolve ambiguities that traditional IMU sensors face. These visual cues are beneficial in scenarios where objects are being handled or during social interactions, as they help disambiguate the captured movements by providing visual context.

Contributions and Potential Applications

The research contributes notably to the field of motion capture in several ways. It presents the first method capable of high-quality full-body motion capture using consumer-level devices, tracks and updates floor levels in a wide range of environments, and optimizes motion capture quality by leveraging visual information from a head-mounted camera.

With its lightweight and affordable solution, the proposed method is a step towards democratizing motion capture technology, making it possible for researchers to explore new areas with motion capture data and enabling a wider audience to create detailed 3D animations with everyday devices. This breakthrough can potentially revolutionize fields like sports analysis, healthcare, animation, and interactive media, opening up new possibilities for understanding and generating human motion data in natural settings.