Category-Level Articulated Object Pose Estimation (1912.11913v2)

Published 26 Dec 2019 in cs.CV, cs.AI, and cs.RO

Abstract: This project addresses the task of category-level pose estimation for articulated objects from a single depth image. We present a novel category-level approach that correctly accommodates object instances previously unseen during training. We introduce Articulation-aware Normalized Coordinate Space Hierarchy (ANCSH) - a canonical representation for different articulated objects in a given category. As the key to achieve intra-category generalization, the representation constructs a canonical object space as well as a set of canonical part spaces. The canonical object space normalizes the object orientation,scales and articulations (e.g. joint parameters and states) while each canonical part space further normalizes its part pose and scale. We develop a deep network based on PointNet++ that predicts ANCSH from a single depth point cloud, including part segmentation, normalized coordinates, and joint parameters in the canonical object space. By leveraging the canonicalized joints, we demonstrate: 1) improved performance in part pose and scale estimations using the induced kinematic constraints from joints; 2) high accuracy for joint parameter estimation in camera space.

Authors (6)

Xiaolong Li (107 papers)
He Wang (294 papers)
Li Yi (111 papers)
Leonidas Guibas (177 papers)
A. Lynn Abbott (9 papers)
Shuran Song (110 papers)

Citations (180)

View on Semantic Scholar

Summary

Articulated Object Pose Estimation from Depth Images

The paper "Category-Level Articulated Object Pose Estimation" presents a comprehensive approach for estimating poses of articulated objects at the category level using a single depth image. This work addresses the challenge of predicting the per-part pose and joint parameters of novel articulated object instances within known categories, tackling the limitations of traditional methods reliant on rigid object assumptions or specific CAD models.

Contribution Overview

The authors introduce the concept of Articulation-aware Normalized Coordinate Space Hierarchy (ANCSH), which serves as a novel canonical representation designed for articulated objects. ANCSH comprises two hierarchical levels:

Normalized Articulated Object Coordinate Space (NAOCS) at the root level, which normalizes the object scale, orientation, and articulation states.
Normalized Part Coordinate Spaces (NPCS) at the leaf level, further normalizing part poses and sizes.

The paper develops a deep learning model based on PointNet++ to predict ANCSH from depth data, enabling algorithms to generalize over intra-category variations by standardizing the representations of object parts and joint parameters.

The representation utilizes a combined optimization strategy that helps ensure the part poses conform with the articulated constraints encoded by joint models. This approach significantly enhances pose accuracy, particularly for objects with freely movable sections, such as eyeglasses or laptops with hinges.

Insights and Results

The authors showcase the algorithm's ability to outperform baselines merely relying on per-part approaches, demonstrating the value of joint constraint leveraging. Notably, the paper provides detailed quantitative evaluations, reporting metrics such as rotation error, translation error, and 3D IoU for parts individually, alongside joint angle errors and translation errors for joints.

Experimental data highlight improved pose estimation when utilizing the combined optimization strategy with joint constraints:

Enhanced accuracy in part position and orientation predictions for unseen object instances.
Superior prediction of joint parameters in camera space enabled by leveraging NAOCS forecasts for joint axis normalization.

Empirical results underscore the algorithm's consistency across several object categories, including eyeglasses, laptops, ovens, and washing machines. The methodology demonstrates resilience against significant occlusions and shape ambiguities, making substantial strides over existing methods even in instance-specific real-world datasets.

Implications and Future Directions

This research provides a structured model for addressing the complexities of articulated object pose estimation, moving beyond rigid constraints to accommodate real-world variability in object articulation. Through its novel use of canonical spaces and integrated constraint optimization, it paves the way for advancements in robotic manipulation, autonomous navigation, and augmented reality applications where accurate perception of articulated objects is vital.

Future directions could explore extending ANCSH to accommodate more complex joint types and higher-dimensional articulated systems beyond the prevalent revolute and prismatic joints. Additionally, integrating this approach with visual data or multi-modal inputs could further increase robustness and precision, particularly under challenging real-world conditions characterized by variable lighting and occlusion dynamics.

In conclusion, the presented work constitutes a meaningful step in articulated object pose estimation, offering a scalable framework ready to influence broader AI developments in perception and robotics.

PDF Markdown