Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 24 tok/s Pro
GPT-5 High 26 tok/s Pro
GPT-4o 92 tok/s Pro
Kimi K2 193 tok/s Pro
GPT OSS 120B 439 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Unsupervised 3D Human Pose Representation with Viewpoint and Pose Disentanglement (2007.07053v2)

Published 14 Jul 2020 in cs.CV

Abstract: Learning a good 3D human pose representation is important for human pose related tasks, e.g. human 3D pose estimation and action recognition. Within all these problems, preserving the intrinsic pose information and adapting to view variations are two critical issues. In this work, we propose a novel Siamese denoising autoencoder to learn a 3D pose representation by disentangling the pose-dependent and view-dependent feature from the human skeleton data, in a fully unsupervised manner. These two disentangled features are utilized together as the representation of the 3D pose. To consider both the kinematic and geometric dependencies, a sequential bidirectional recursive network (SeBiReNet) is further proposed to model the human skeleton data. Extensive experiments demonstrate that the learned representation 1) preserves the intrinsic information of human pose, 2) shows good transferability across datasets and tasks. Notably, our approach achieves state-of-the-art performance on two inherently different tasks: pose denoising and unsupervised action recognition. Code and models are available at: \url{https://github.com/NIEQiang001/unsupervised-human-pose.git}

Citations (65)

Summary

Unsupervised 3D Human Pose Representation with Viewpoint and Pose Disentanglement

The paper "Unsupervised 3D Human Pose Representation with Viewpoint and Pose Disentanglement" presents a novel approach for learning representations of 3D human poses. It specifically addresses the challenges inherent in preserving intrinsic pose information while adapting to variations in viewpoint. This work is significant for tasks such as 3D human pose estimation and action recognition, where understanding dependencies between joints in a human skeleton from various perspectives is crucial.

The authors introduce a Siamese denoising autoencoder to disentangle pose-dependent and viewpoint-dependent features from 3D skeletal data in an unsupervised manner. They further propose a Sequential Bidirectional Recursive Network (SeBiReNet) to model the kinematic and geometric dependencies present within the human skeletal structure. This network design is particularly suited to capture the complex, hierarchical dependencies between joints that characterize human movement.

Extensive experiments demonstrate that the learned representation effectively preserves the intrinsic properties of human poses and exhibits robust transferability across different datasets and tasks. Notably, the architecture achieves state-of-the-art performance in pose denoising and unsupervised action recognition tasks. Such results underscore the effectiveness of the disentanglement strategy, which overcomes challenges associated with view variability by separately modeling view-sensitive and view-invariant features.

Implications and Future Directions

The implications of this research are two-fold, impacting both practical applications and theoretical understanding of human pose analysis. Practically, the ability to accurately disentangle and represent pose and viewpoint features could enhance systems in human-robot interaction, surveillance, and healthcare by providing more robust recognition and analysis of human behaviors under diverse conditions. Theoretically, the approach contributes to the ongoing discourse about representation learning by emphasizing minimal information loss and maximal feature disentanglement.

Looking forward, this research opens avenues for further exploration into AI systems capable of enhanced human-machine interactions. Future work could explore real-time applications and explore the integration of more nuanced environmental factors affecting human motion. Additionally, expansion into dynamic and more complex datasets will test the generalizability and scalability of this model.

The results achieved in this paper represent an incremental yet valuable improvement in understanding and modeling 3D human pose. The approach of feature disentanglement is well-founded and could influence similar applications across other domains in AI, proving its utility beyond the immediate scope of pose estimation and action recognition.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Youtube Logo Streamline Icon: https://streamlinehq.com

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube