- The paper presents a novel fusion technique that integrates wearable IMUs with multi-view images to enhance human pose estimation accuracy.
- It employs a two-step process that first refines 2D poses using both sensors and then lifts them to 3D using geometric constraints to handle occlusions.
- The proposed method significantly reduces 3D position error to 24.6mm on the Total Capture dataset, outperforming previous state-of-the-art results.
Overview of Fusing Wearable IMUs with Multi-View Images for Human Pose Estimation: A Geometric Approach
The paper presents a novel method for estimating 3D human pose by integrating wearable Inertial Measurement Units (IMUs) with multi-view images, leveraging a geometric approach to enhance traditional 2D visual feature detection, especially under conditions of joint occlusion. This methodology, termed the Orientation Regularized Network (ORN) and Orientation Regularized Pictorial Structure Model (ORPSM), effectively utilizes structural orientation data from IMUs to augment visual features and thereby improve the accuracy of both 2D and 3D pose estimations.
Methodology
The researchers employ a two-step pose estimation process. Initially, 2D poses are derived from both image signals and IMUs. Subsequently, using geometric constraints informed by IMU data, these 2D poses are elevated to a 3D spatial context. The ORN utilizes IMU orientations as structural priors to mutually enhance image features among pairs of linked joints. This cross-joint fusion is key in localizing occluded joints based on their relationships with neighboring joints in the 3D space. The paper describes how geometric sampling along camera lines can identify potential joint locations and use multi-view fusion to resolve occlusion issues effectively.
In the second step, the work extends this geometric approach through ORPSM, which enforces consistency between 3D limb orientations and IMUs while minimizing projection errors between 2D and 3D poses. This step leverages the Pictorial Structure Model with added orientation regularization, which enables robust pose estimation by balancing the projection discrepancies and limb position accuracies.
Numerical Results and Claims
The paper reports substantial reductions in pose estimation error. Tests on the Total Capture dataset revealed that ORPSM, underpinned by ORN-estimated 2D poses, yielded a remarkable 3D position error of 24.6mm compared to the previous state-of-the-art's 29mm. The improvements in 2D pose estimation accuracy were notably significant for obscured joints like ankles and wrists, proving the effectiveness of the proposed visual-inertial fusion.
Implications and Future Work
The implications of this research are profound, suggesting a shift from late fusion approaches where IMUs and image-derived poses are combined post-estimation, to early-stage fusion where IMU data enhances visual feature detection from the outset. This could redefine robustness in pose estimation systems across applications requiring high accuracy under real-world conditions subject to occlusions.
Future research directions may involve refining the fusion process by developing indicators for the reliability of sensor data, thus enhancing the robustness of pose estimations further. Incorporating temporal dynamics and sensor fusion based on trust levels promises enhanced efficacy in maintaining pose accuracy over extended periods and varying activity types.
Overall, the paper contributes to the field of pose estimation by presenting a systematically refined approach to sensor-integrated visual feature enhancement, paving the way for more precise and reliable human pose estimation systems.