GRAB: A Dataset of Whole-Body Human Grasping of Objects (2008.11200v1)

Published 25 Aug 2020 in cs.CV

Abstract: Training computers to understand, model, and synthesize human grasping requires a rich dataset containing complex 3D object shapes, detailed contact information, hand pose and shape, and the 3D body motion over time. While "grasping" is commonly thought of as a single hand stably lifting an object, we capture the motion of the entire body and adopt the generalized notion of "whole-body grasps". Thus, we collect a new dataset, called GRAB (GRasping Actions with Bodies), of whole-body grasps, containing full 3D shape and pose sequences of 10 subjects interacting with 51 everyday objects of varying shape and size. Given MoCap markers, we fit the full 3D body shape and pose, including the articulated face and hands, as well as the 3D object pose. This gives detailed 3D meshes over time, from which we compute contact between the body and object. This is a unique dataset, that goes well beyond existing ones for modeling and understanding how humans grasp and manipulate objects, how their full body is involved, and how interaction varies with the task. We illustrate the practical value of GRAB with an example application; we train GrabNet, a conditional generative network, to predict 3D hand grasps for unseen 3D object shapes. The dataset and code are available for research purposes at https://grab.is.tue.mpg.de.

Authors (4)

Omid Taheri (17 papers)
Nima Ghorbani (5 papers)
Michael J. Black (163 papers)
Dimitrios Tzionas (35 papers)

Citations (311)

View on Semantic Scholar

Summary

Analysis of the GRAB Dataset and Its Implications for Human-Object Interaction Modeling

The paper "GRAB: A Dataset of Whole-Body" introduces a uniquely comprehensive dataset designed to enhance the modeling of human-object interactions, particularly focusing on whole-body grasps. It addresses a significant gap in the existing datasets by providing intricate 3D shapes and poses of human bodies as they interact with objects of various forms and dimensions. This detailed capture allows for a robust analysis and understanding of the dynamic interactions and contact points not limited to hands but extending to the entire body, including intricate motions involving the face and other parts.

Key Contributions and Methodology

The GRAB dataset is distinguished by several innovative features:

Comprehensive Motion Capture (MoCap) Technique: The dataset utilizes advanced MoCap technology to track the complete 3D shape and motion of the human body, including facial features and hand articulations. This comprehensive approach ensures the capture of minute interaction details and dynamic movements.
Rich Annotation of Contact: By generating 3D meshes through MoSh++, the dataset infers contact areas between objects and various body parts. This granular data surpasses prior attempts to paper human-object grasping, offering a more complete picture that is crucial for realistically modeling these interactions.
Diverse Object Interactions: The dataset encompasses interactions across different contexts such as lifting, handing over, and using objects, providing a broad perspective on how different tasks influence grasp techniques.
Application in Predictive Modeling: The authors demonstrate the practical applicability of the GRAB dataset by training GrabNet, a conditional generative network, which effectively predicts plausible 3D hand grasps for unseen objects, indicating the dataset's utility in driving forward machine learning models to achieve realistic hand-object interaction predictions.

Numerical Results and Evaluation

The predictions made using GrabNet show promising results in generating realistic grasps, which are evaluated both quantitatively and through a user paper. The evaluation indicates that the generated grasps are often as plausible as real ones captured in the dataset, thus validating the model's effectiveness and the utility of the GRAB dataset.

Practical and Theoretical Implications

This dataset holds substantial implications in several domains:

Robotics and Human-Computer Interaction: The ability to simulate and predict whole-body grasps can significantly enhance the development of robotic systems and virtual avatars that need to interact with their environments in human-like ways.
Ergonomics and Design: By understanding how humans naturally manipulate objects, designers can create better tools, devices, and interfaces that are optimally suited to human physical interaction patterns.
AI and Computer Vision: The dataset and the accompanying models can drive progress in AI models aimed at more sophisticated understanding and reconstruction of 3D human-object interaction from visual data.

Future Directions

The GRAB dataset is a foundational step towards a more nuanced understanding of human-object interactions. Future directions could include extending the dataset to incorporate more diverse scenarios and contexts, possibly integrating synchronized imaging data for multimodal research. Additionally, incorporating such datasets into larger AI frameworks could push the envelope in realistic simulation and synthetic data generation, thereby further bridging the gap between robotic perception and human-like interaction.

In summary, the GRAB dataset represents a critical advancement in capturing the complexities of whole-body human-object interaction. Its application in training models like GrabNet underscores its potential in expanding the horizons of computer vision and AI by providing rich data for realistic modeling of human grasps.

PDF Markdown

Related Papers

Find Related Papers