NIKI: Neural Inverse Kinematics with Invertible Neural Networks for 3D Human Pose and Shape Estimation (2305.08590v1)

Published 15 May 2023 in cs.CV

Abstract: With the progress of 3D human pose and shape estimation, state-of-the-art methods can either be robust to occlusions or obtain pixel-aligned accuracy in non-occlusion cases. However, they cannot obtain robustness and mesh-image alignment at the same time. In this work, we present NIKI (Neural Inverse Kinematics with Invertible Neural Network), which models bi-directional errors to improve the robustness to occlusions and obtain pixel-aligned accuracy. NIKI can learn from both the forward and inverse processes with invertible networks. In the inverse process, the model separates the error from the plausible 3D pose manifold for a robust 3D human pose estimation. In the forward process, we enforce the zero-error boundary conditions to improve the sensitivity to reliable joint positions for better mesh-image alignment. Furthermore, NIKI emulates the analytical inverse kinematics algorithms with the twist-and-swing decomposition for better interpretability. Experiments on standard and occlusion-specific benchmarks demonstrate the effectiveness of NIKI, where we exhibit robust and well-aligned results simultaneously. Code is available at https://github.com/Jeff-sjtu/NIKI

Citations (44)

View on Semantic Scholar

Summary

The paper’s main contribution is introducing NIKI, which leverages invertible neural networks to decouple error from plausible human poses for robust 3D estimation.
It employs twist-and-swing decomposition to break down complex rotations, enhancing interpretability and precise pixel-level alignment despite occlusion challenges.
Experimental results on benchmarks like 3DPW and AGORA demonstrate NIKI's superior performance in both occluded and standard scenarios.

NIKI: Neural Inverse Kinematics with Invertible Neural Networks for 3D Human Pose and Shape Estimation

The paper introduces a novel method named NIKI, which leverages invertible neural networks (INNs) to tackle the challenging problem of 3D human pose and shape (HPS) estimation from monocular images. The central issue addressed by the authors is the inability of existing methods to balance robustness to occlusions and accuracy in aligning estimated poses with image pixels. Current approaches usually excel in one of these aspects but fail to combine both.

NIKI's innovative approach involves modeling the inverse kinematics (IK) problem using INNs, which enable robust pose estimation through the decoupling of error information from plausible human pose data. This approach contrasts with traditional analytical and standard neural network-based IK methods that are sensitive to errors in joint positions, particularly when dealing with occlusions.

The paper details how NIKI improves robustness against occlusions by explicitly decoupling the error information using INNs in two key processes: forward and inverse. In the inverse process, NIKI separates the pose from errors using a latent error embedding, ensuring the rotations are reflective of plausible human poses despite input noise or occlusions. In the forward process, the model enforces zero-error conditions, leading to accurate mesh-image alignment when the joint positions are reliable.

Experimental results demonstrate the effectiveness of NIKI across several standard and occlusion-specific benchmarks. On datasets like 3DPW, AGORA, and novel occlusion-enhanced datasets such as 3DPW-XOCC, NIKI consistently outperforms existing methods, underscoring its robustness to challenging conditions while maintaining high accuracy in less obstructed scenarios.

A critical aspect of NIKI's architecture is its use of the twist-and-swing decomposition, which enhances the interpretability and efficiency of the IK mapping. This decomposition breaks down the complex rotations into more manageable components, resonating well with the architecture's reliance on the inherently bijective INNs. Through rigorous comparisons with state-of-the-art techniques, NIKI's ability to perform well under diverse occlusion scenarios without sacrificing performance on standard benchmarks is evidenced.

In practical terms, the implications of this research are significant for applications requiring reliable 3D human pose and shape estimation in environments where occlusions are prevalent. These applications range from augmented reality to video surveillance and human-computer interaction.

Looking forward, NIKI's methodology opens pathways for further research into the integration of INNs with other aspects of 3D human modeling, such as dynamic scenes and environmental constraints. Additionally, the exploration of joint learning frameworks where INNs can be combined with other probabilistic models may provide novel solutions for even more robust and interpretable 3D human pose estimations.

In conclusion, NIKI presents a compelling advancement in the field of 3D human pose and shape estimation, leveraging the power of invertible neural networks to simultaneously achieve robustness to occlusions and pixel-aligned accuracy, thus addressing a significant gap in current methodologies.

PDF Markdown

Related Papers

GitHub

GitHub - Jeff-sjtu/NIKI: Code of "NIKI: Neural Inverse Kinematics with Invertible Neural Networks for 3D Human Pose and Shape Estimation", CVPR 2023 (261 stars)

Tweets

https://twitter.com/jiefengli_jeff/status/1670696117674344448