Fusing Wearable IMUs with Multi-View Images for Human Pose Estimation: A Geometric Approach (2003.11163v2)

Published 25 Mar 2020 in cs.CV

Abstract: We propose to estimate 3D human pose from multi-view images and a few IMUs attached at person's limbs. It operates by firstly detecting 2D poses from the two signals, and then lifting them to the 3D space. We present a geometric approach to reinforce the visual features of each pair of joints based on the IMUs. This notably improves 2D pose estimation accuracy especially when one joint is occluded. We call this approach Orientation Regularized Network (ORN). Then we lift the multi-view 2D poses to the 3D space by an Orientation Regularized Pictorial Structure Model (ORPSM) which jointly minimizes the projection error between the 3D and 2D poses, along with the discrepancy between the 3D pose and IMU orientations. The simple two-step approach reduces the error of the state-of-the-art by a large margin on a public dataset. Our code will be released at https://github.com/CHUNYUWANG/imu-human-pose-pytorch.

Citations (63)

View on Semantic Scholar

Summary

The paper presents a novel fusion technique that integrates wearable IMUs with multi-view images to enhance human pose estimation accuracy.
It employs a two-step process that first refines 2D poses using both sensors and then lifts them to 3D using geometric constraints to handle occlusions.
The proposed method significantly reduces 3D position error to 24.6mm on the Total Capture dataset, outperforming previous state-of-the-art results.

Overview of Fusing Wearable IMUs with Multi-View Images for Human Pose Estimation: A Geometric Approach

The paper presents a novel method for estimating 3D human pose by integrating wearable Inertial Measurement Units (IMUs) with multi-view images, leveraging a geometric approach to enhance traditional 2D visual feature detection, especially under conditions of joint occlusion. This methodology, termed the Orientation Regularized Network (ORN) and Orientation Regularized Pictorial Structure Model (ORPSM), effectively utilizes structural orientation data from IMUs to augment visual features and thereby improve the accuracy of both 2D and 3D pose estimations.

Methodology

The researchers employ a two-step pose estimation process. Initially, 2D poses are derived from both image signals and IMUs. Subsequently, using geometric constraints informed by IMU data, these 2D poses are elevated to a 3D spatial context. The ORN utilizes IMU orientations as structural priors to mutually enhance image features among pairs of linked joints. This cross-joint fusion is key in localizing occluded joints based on their relationships with neighboring joints in the 3D space. The paper describes how geometric sampling along camera lines can identify potential joint locations and use multi-view fusion to resolve occlusion issues effectively.

In the second step, the work extends this geometric approach through ORPSM, which enforces consistency between 3D limb orientations and IMUs while minimizing projection errors between 2D and 3D poses. This step leverages the Pictorial Structure Model with added orientation regularization, which enables robust pose estimation by balancing the projection discrepancies and limb position accuracies.

Numerical Results and Claims

The paper reports substantial reductions in pose estimation error. Tests on the Total Capture dataset revealed that ORPSM, underpinned by ORN-estimated 2D poses, yielded a remarkable 3D position error of 24.6mm compared to the previous state-of-the-art's 29mm. The improvements in 2D pose estimation accuracy were notably significant for obscured joints like ankles and wrists, proving the effectiveness of the proposed visual-inertial fusion.

Implications and Future Work

The implications of this research are profound, suggesting a shift from late fusion approaches where IMUs and image-derived poses are combined post-estimation, to early-stage fusion where IMU data enhances visual feature detection from the outset. This could redefine robustness in pose estimation systems across applications requiring high accuracy under real-world conditions subject to occlusions.

Future research directions may involve refining the fusion process by developing indicators for the reliability of sensor data, thus enhancing the robustness of pose estimations further. Incorporating temporal dynamics and sensor fusion based on trust levels promises enhanced efficacy in maintaining pose accuracy over extended periods and varying activity types.

Overall, the paper contributes to the field of pose estimation by presenting a systematically refined approach to sensor-integrated visual feature enhancement, paving the way for more precise and reliable human pose estimation systems.

PDF Markdown

Related Papers

YouTube

Show All Videos