Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Hybrid 3D Human Pose Estimation with Monocular Video and Sparse IMUs (2404.17837v1)

Published 27 Apr 2024 in cs.CV and cs.HC

Abstract: Temporal 3D human pose estimation from monocular videos is a challenging task in human-centered computer vision due to the depth ambiguity of 2D-to-3D lifting. To improve accuracy and address occlusion issues, inertial sensor has been introduced to provide complementary source of information. However, it remains challenging to integrate heterogeneous sensor data for producing physically rational 3D human poses. In this paper, we propose a novel framework, Real-time Optimization and Fusion (RTOF), to address this issue. We first incorporate sparse inertial orientations into a parametric human skeleton to refine 3D poses in kinematics. The poses are then optimized by energy functions built on both visual and inertial observations to reduce the temporal jitters. Our framework outputs smooth and biomechanically plausible human motion. Comprehensive experiments with ablation studies demonstrate its rationality and efficiency. On Total Capture dataset, the pose estimation error is significantly decreased compared to the baseline method.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. Exploiting temporal context for 3d human pose estimation in the wild, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3395–3404.
  2. Fusepose: Imu-vision sensor fusion in kinematic space for parametric human pose estimation. arXiv preprint arXiv:2208.11960 .
  3. Anatomy-aware 3d human pose estimation with bone-based pose decomposition. IEEE Transactions on Circuits and Systems for Video Technology 32, 198–209.
  4. Beyond static features for temporally consistent 3d human pose and shape from a video, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1964–1973.
  5. On solving the inverse kinematics problem using neural networks, in: 2017 24th International Conference on Mechatronics and Machine Vision in Practice (M2VIP), IEEE. pp. 1–6.
  6. Dual-hand detection for human–robot interaction by a parallel network based on hand detection and body pose estimation. IEEE Transactions on Industrial Electronics 66, 9663–9672.
  7. Fusing visual and inertial sensors with semantics for 3d human pose estimation. International Journal of Computer Vision 127, 381–397.
  8. Poseaug: A differentiable pose augmentation framework for 3d human pose estimation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8575–8584.
  9. Practical parameterization of rotations using the exponential map. Journal of graphics tools 3, 29–48.
  10. A self-supervised metric learning framework for the arising-from-chair assessment of parkinsonians with graph convolutional networks. IEEE Transactions on Circuits and Systems for Video Technology .
  11. Personalization and evaluation of a real-time depth-based full body tracker, in: 3DV, pp. 279–286.
  12. Exploiting temporal information for 3d human pose estimation, in: Proceedings of the European Conference on Computer Vision (ECCV), pp. 68–84.
  13. Deepfuse: An imu-aware network for real-time 3d human pose estimation from multi-view image, in: WACV, pp. 429–438.
  14. Deep inertial poser: Learning to reconstruct human pose from sparse inertial measurements in real time. ACM Transactions on Graphics (TOG) 37, 1–15.
  15. Human3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE transactions on pattern analysis and machine intelligence 36, 1325–1339.
  16. Learnable triangulation of human pose, in: ICCV, pp. 7718–7727.
  17. Transformer inertial poser: Attention-based real-time human motion reconstruction from sparse imus. arXiv e-prints , arXiv–2203.
  18. Learning 3d human dynamics from video, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5614–5623.
  19. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 .
  20. Vibe: Video inference for human body pose and shape estimation, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5253–5263.
  21. Unite the people: Closing the loop between 3d and 2d human representations, in: CVPR, pp. 6050–6059.
  22. Hybrik: A hybrid analytical-neural inverse kinematics solution for 3d human pose and shape estimation, in: CVPR, pp. 3383–3393.
  23. On the limited memory bfgs method for large scale optimization. Mathematical programming 45, 503–528.
  24. Markerless motion capture of multiple characters using multiview image segmentation. IEEE transactions on pattern analysis and machine intelligence 35, 2720–2735.
  25. Smpl: A skinned multi-person linear model. ACM transactions on graphics (TOG) 34, 1–16.
  26. 3d human motion estimation via motion compression and refinement, in: Proceedings of the Asian Conference on Computer Vision.
  27. Amass: Archive of motion capture as surface shapes, in: Proceedings of the IEEE/CVF international conference on computer vision, pp. 5442–5451.
  28. Recovering accurate 3d human pose in the wild using imus and a moving camera, in: ECCV, pp. 601–617.
  29. A simple yet effective baseline for 3d human pose estimation, in: ICCV, pp. 2640–2649.
  30. A mathematical introduction to robotic manipulation. CRC press.
  31. Automatic differentiation in pytorch .
  32. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32.
  33. Modeling human motion with quaternion-based neural networks. International Journal of Computer Vision , 1–18.
  34. 3d human pose estimation in video with temporal convolutions and semi-supervised training, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7753–7762.
  35. Human pose estimation from video and inertial sensors. Ph.D. thesis. Leibniz Universität Hannover Hannover.
  36. Multisensor-fusion for 3d full-body human motion capture, in: CVPR, pp. 663–670.
  37. Cross view fusion for 3d human pose estimation, in: ICCV, pp. 4342–4351.
  38. Lightweight multi-view 3d pose estimation through camera-disentangled representation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6040–6049.
  39. P-stmo: Pre-trained spatial temporal many-to-one model for 3d human pose estimation. arXiv preprint arXiv:2203.07628 .
  40. Motionet: 3d human motion reconstruction from monocular video with skeleton consistency. ACM Transactions on Graphics (TOG) 40, 1–15.
  41. Animating rotation with quaternion curves, in: Proceedings of the 12th annual conference on Computer graphics and interactive techniques, pp. 245–254.
  42. Deep high-resolution representation learning for human pose estimation, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5693–5703.
  43. Integral human pose regression, in: ECCV, pp. 529–545.
  44. Deep autoencoder for combined human pose estimation and body model upscaling, in: ECCV, pp. 784–800.
  45. Total capture: 3d human pose estimation fusing video and inertial sensors, in: Proceedings of 28th British Machine Vision Conference, pp. 1–13.
  46. Robust error-state kalman filter for estimating imu orientation. IEEE Sensors Journal 21, 3561–3569.
  47. Human pose estimation from video and imus. IEEE transactions on pattern analysis and machine intelligence 38, 1533–1547.
  48. Simple baselines for human pose estimation and tracking, in: Proceedings of the European conference on computer vision (ECCV), pp. 466–481.
  49. Full 6dof human motion tracking using miniature inertial sensors. Daniel RoetenbergLuingeHenk .
  50. Transpose: real-time 3d human translation and pose estimation with six inertial sensors. ACM Transactions on Graphics (TOG) 40, 1–13.
  51. Monocular 3d pose and shape estimation of multiple people in natural scenes-the importance of multiple scene constraints, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2148–2157.
  52. Ray3d: ray-based 3d human pose estimation for monocular absolute 3d localization, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13116–13125.
  53. Fusing wearable imus with multi-view images for human pose estimation: A geometric approach, in: CVPR, pp. 2200–2209.
  54. 3d human pose estimation with spatial and temporal transformers, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11656–11665.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com