Learning joint reconstruction of hands and manipulated objects (1904.05767v1)

Published 11 Apr 2019 in cs.CV

Abstract: Estimating hand-object manipulations is essential for interpreting and imitating human actions. Previous work has made significant progress towards reconstruction of hand poses and object shapes in isolation. Yet, reconstructing hands and objects during manipulation is a more challenging task due to significant occlusions of both the hand and object. While presenting challenges, manipulations may also simplify the problem since the physics of contact restricts the space of valid hand-object configurations. For example, during manipulation, the hand and object should be in contact but not interpenetrate. In this work, we regularize the joint reconstruction of hands and objects with manipulation constraints. We present an end-to-end learnable model that exploits a novel contact loss that favors physically plausible hand-object constellations. Our approach improves grasp quality metrics over baselines, using RGB images as input. To train and evaluate the model, we also propose a new large-scale synthetic dataset, ObMan, with hand-object manipulations. We demonstrate the transferability of ObMan-trained models to real data.

Authors (7)

Yana Hasson (9 papers)
Gül Varol (39 papers)
Dimitrios Tzionas (35 papers)
Igor Kalevatykh (5 papers)
Michael J. Black (163 papers)
Ivan Laptev (99 papers)
Cordelia Schmid (206 papers)

Citations (472)

View on Semantic Scholar

Summary

Learning Joint Reconstruction of Hands and Manipulated Objects

The paper "Learning Joint Reconstruction of Hands and Manipulated Objects" presents a comprehensive method for simultaneously estimating 3D hand poses and object shapes from RGB images. This work addresses the challenge of occlusions in hand-object interactions and leverages physical constraints inherent in manipulation tasks to enhance reconstruction accuracy.

Methodology

The authors propose an end-to-end learnable model that integrates a differentiable layer based on the MANO hand model, which allows for the generation of anthropomorphically valid hand meshes. The model consists of two main branches: one dedicated to hand pose estimation and the other to object reconstruction. The hand branch predicts both the pose and shape parameters in a reduced PCA space to efficiently represent hand configurations.

A novel contribution is the introduction of a contact loss function, which comprises two main components: a repulsion term to prevent interpenetration and an attraction term to encourage contact between the hand and object. This loss ensures physically plausible hand-object interactions during manipulation tasks.

Dataset

A new large-scale synthetic dataset, ObMan, is introduced to facilitate training and evaluation. This dataset contains diverse hand-object configurations generated using the GraspIt simulation tool, which automates the creation of plausible grasp poses. The dataset's scale and diversity enable the training of deep networks and support transferability to real-world scenarios.

Results

The paper demonstrates the effectiveness of the proposed model through several key metrics. Significant improvements are observed in grasp quality compared to baseline methods, as evidenced by reduced penetration depths and stable simulation displacements. The adoption of the contact loss further enhances the physical realism of hand-object interactions.

Transfer learning experiments highlight the utility of pre-training on synthetic data for tasks involving real images, particularly in low-data scenarios. The benchmark against the StereoHands dataset confirms that hand pose estimations using the proposed model are competitive with state-of-the-art techniques.

Implications

The work advances the understanding of multi-object environments by focusing on hand-object reconstruction during manipulation tasks. Practically, this has potential applications in areas like virtual and augmented reality and robotics, where interaction with physical objects is key. Theoretically, the integration of physical constraints into learning frameworks offers a promising direction for further exploration in AI and computer vision.

Future Work

Future research could focus on enhancing the generalization of hand-object interaction models to more complex and dynamic actions. Learning grasp affordances from large-scale visual data could provide insights into robust robotic manipulation in diverse settings. Investigating deeper integration of physical laws, such as those governing deformable objects, could also lead to more accurate reconstructions.

In summary, the paper provides a solid foundation for future exploration in modeling and understanding hand-object interactions, with a robust framework and promising results on both synthetic and real data.

PDF Markdown