Sparse Inertial Poser: Automatic 3D Human Pose Estimation from Sparse IMUs (1703.08014v2)

Published 23 Mar 2017 in cs.CV and cs.GR

Abstract: We address the problem of making human motion capture in the wild more practical by using a small set of inertial sensors attached to the body. Since the problem is heavily under-constrained, previous methods either use a large number of sensors, which is intrusive, or they require additional video input. We take a different approach and constrain the problem by: (i) making use of a realistic statistical body model that includes anthropometric constraints and (ii) using a joint optimization framework to fit the model to orientation and acceleration measurements over multiple frames. The resulting tracker Sparse Inertial Poser (SIP) enables 3D human pose estimation using only 6 sensors (attached to the wrists, lower legs, back and head) and works for arbitrary human motions. Experiments on the recently released TNT15 dataset show that, using the same number of sensors, SIP achieves higher accuracy than the dataset baseline without using any video data. We further demonstrate the effectiveness of SIP on newly recorded challenging motions in outdoor scenarios such as climbing or jumping over a wall.

Citations (274)

View on Semantic Scholar

Summary

The paper introduces Sparse Inertial Poser, a novel method that uses six strategically placed IMUs and anthropometric constraints to capture 3D human poses.
The paper employs a joint optimization framework with a realistic statistical body model to overcome challenges from sparse sensor data.
The paper demonstrates superior accuracy on datasets like TNT15 and highlights potential applications in sports analytics, VR, and healthcare.

Sparse Inertial Poser: Automatic 3D Human Pose Estimation from Sparse IMUs

The paper "Sparse Inertial Poser: Automatic 3D Human Pose Estimation from Sparse IMUs" introduces a novel method for human motion capture in unconstrained environments using a minimal set of Inertial Measurement Units (IMUs). Traditional approaches to human motion capture rely either on numerous sensors, which can be intrusive, or on video input, which constrains the capture to specific environments and conditions. This paper presents a method, Sparse Inertial Poser (SIP), that utilizes only six IMUs to accurately estimate 3D human pose, thereby providing a practical and less intrusive alternative for capturing human motion across diverse settings.

Method Overview

The core innovation of the SIP method lies in its use of a realistic statistical body model along with a joint optimization framework to address the under-constrained nature of pose estimation from sparse IMUs. The statistical model incorporates anthropometric constraints, facilitating the mapping of orientation and acceleration data into full-body poses without the need for extensive video data or a large array of sensors. The SIP configuration employs six IMUs placed on the wrists, lower legs, back, and head, and achieves high accuracy even for arbitrary human motions.

Experimental Results

The empirical evaluation of SIP is performed using the TNT15 dataset, where the method demonstrates superior accuracy compared to traditional baseline approaches that either use more sensors or rely on video input. Furthermore, SIP has been tested on newly recorded datasets capturing challenging outdoor activities like climbing and jumping. The results underscore SIP's robustness and applicability to a wide range of motion capture tasks in dynamic environments.

Implications and Future Directions

Practically, SIP's ability to capture human motion with minimal sensors broadens the accessibility of motion capture technology, making it suitable for applications in sports analysis, virtual reality, healthcare monitoring, and more. Theoretically, SIP leverages a statistical body model's anthropometric constraints to resolve ambiguities in pose estimation, indicating promising avenues for integrating more comprehensive body models and expanding on the types of human activities that can be captured.

Future work could explore integrating SIP with other systems, such as GPS modules or low-cost vision systems, to enhance global position accuracy and mitigate drift. Moreover, incorporating constraints from environmental interactions or object manipulations could further improve the fidelity of motion capture. By addressing these areas, SIP could extend its utility across even more demanding application scenarios, paving the way for advanced human-computer interaction systems and comprehensive biomechanical analyses in naturalistic settings.

PDF Markdown