View-Invariant Probabilistic Embedding for Human Pose (1912.01001v4)

Published 2 Dec 2019 in cs.CV

Abstract: Depictions of similar human body configurations can vary with changing viewpoints. Using only 2D information, we would like to enable vision algorithms to recognize similarity in human body poses across multiple views. This ability is useful for analyzing body movements and human behaviors in images and videos. In this paper, we propose an approach for learning a compact view-invariant embedding space from 2D joint keypoints alone, without explicitly predicting 3D poses. Since 2D poses are projected from 3D space, they have an inherent ambiguity, which is difficult to represent through a deterministic mapping. Hence, we use probabilistic embeddings to model this input uncertainty. Experimental results show that our embedding model achieves higher accuracy when retrieving similar poses across different camera views, in comparison with 2D-to-3D pose lifting models. We also demonstrate the effectiveness of applying our embeddings to view-invariant action recognition and video alignment. Our code is available at https://github.com/google-research/google-research/tree/master/poem.

Authors (6)

Jennifer J. Sun (24 papers)
Jiaping Zhao (12 papers)
Liang-Chieh Chen (66 papers)
Florian Schroff (21 papers)
Hartwig Adam (49 papers)
Ting Liu (329 papers)

Citations (73)

View on Semantic Scholar

Summary

View-Invariant Probabilistic Embedding for Human Pose

The presented paper introduces an innovative approach to establishing view-invariant probabilistic embeddings for human pose recognition using 2D joint keypoints. The primary goal is to enable vision algorithms to identify similar human body configurations across varying camera perspectives using only 2D data. This technique is pivotal for applications involving the analysis of human movements and actions in visual data.

Core Methodology

The approach bypasses the need for explicit 3D pose prediction, addressing the inherent ambiguity in 2D poses, which are projections from 3D space. The authors propose probabilistic embeddings to model this uncertainty, thus capturing a range of possible 3D poses corresponding to a given 2D input. The embedding models are developed using metric learning principles, specifically inspired by 2D-to-3D lifting models, which have predominantly focused on deterministic mappings to point embeddings.

The proposed method, termed Probabilistic View-Invariant Pose Embeddings (Pr-VIPE), contrasts with conventional point embeddings (VIPE) by mapping the 2D poses to distributions in the embedding space rather than points, thus better accounting for the variability in the 2D data due to perspective changes. Pr-VIPE utilizes multivariate Gaussian distributions to represent embeddings, with training employing a combination of triplet ratio loss, positive pairwise loss, and a Gaussian prior loss which promotes a structured embedding space aligned with the task objectives.

Experimental Validation and Results

The authors detail comprehensive experiments on several datasets, including Human3.6M and MPI-INF-3DHP, evaluating the performance of their model in cross-view pose retrieval tasks. Pr-VIPE demonstrates superior accuracy in retrieval, outperforming traditional 2D-to-3D lifting models, particularly in unseen datasets like 3DHP. Noteworthy numerical results illustrate Pr-VIPE's robustness, with the embeddings being effective not only in pose retrieval but also in downstream applications such as view-invariant action recognition and video sequence alignment, achieving competitive results against state-of-the-art methods tailored explicitly for those tasks.

Practical and Theoretical Implications

The ability to retrieve and recognize poses across diverse viewpoints without needing to compute expensive rigid transformations or relying on image context underscores the potential of Pr-VIPE in real-world applications, such as video surveillance, sports analytics, and human-computer interaction systems. The probabilistic framework also hints at promising avenues for handling input uncertainty in other vision tasks where 2D data serves as a proxy for 3D information.

Speculative Future Directions

Looking ahead, the extension of probabilistic embeddings to multi-person scenarios and its application to more complex objects beyond human anatomy presents intriguing opportunities for researchers. Another exciting prospect is exploring more advanced variants of the embedding formulation to further enhance its predictive capabilities and adaptivity to different domains. As AI models continue to evolve, the intersection of probabilistic embeddings and LLMs offers fertile ground for innovation in multimodal AI systems.

In summary, the paper's exploration into probabilistic embeddings offers a substantive advancement in understanding and executing view-invariant pose estimations. Its implications are not only confined to computer vision but also resonate across broader AI applications, serving as a foundation for future research endeavors.

PDF Markdown