SAGA: Stochastic Whole-Body Grasping with Contact (2112.10103v3)

Published 19 Dec 2021 in cs.CV

Abstract: The synthesis of human grasping has numerous applications including AR/VR, video games and robotics. While methods have been proposed to generate realistic hand-object interaction for object grasping and manipulation, these typically only consider interacting hand alone. Our goal is to synthesize whole-body grasping motions. Starting from an arbitrary initial pose, we aim to generate diverse and natural whole-body human motions to approach and grasp a target object in 3D space. This task is challenging as it requires modeling both whole-body dynamics and dexterous finger movements. To this end, we propose SAGA (StochAstic whole-body Grasping with contAct), a framework which consists of two key components: (a) Static whole-body grasping pose generation. Specifically, we propose a multi-task generative model, to jointly learn static whole-body grasping poses and human-object contacts. (b) Grasping motion infilling. Given an initial pose and the generated whole-body grasping pose as the start and end of the motion respectively, we design a novel contact-aware generative motion infilling module to generate a diverse set of grasp-oriented motions. We demonstrate the effectiveness of our method, which is a novel generative framework to synthesize realistic and expressive whole-body motions that approach and grasp randomly placed unseen objects. Code and models are available at https://jiahaoplus.github.io/SAGA/saga.html.

Authors (7)

Yan Wu (109 papers)
Jiahao Wang (88 papers)
Yan Zhang (954 papers)
Siwei Zhang (33 papers)
Otmar Hilliges (120 papers)
Fisher Yu (104 papers)
Siyu Tang (86 papers)

Citations (66)

View on Semantic Scholar

Summary

Synthesis of Human Whole-Body Grasping Motions Through SAGA Framework

The paper presents a novel approach to the challenging problem of synthesizing realistic whole-body grasping motions through a framework named SAGA (Stochastic Whole-Body Grasping with Contact). This research targets applications across fields like robotics, animation, and virtual reality, focusing on generating diverse and natural whole-body human motions that interact with 3D objects using both body dynamics and dexterous finger movements.

Framework Overview

The SAGA framework consists of two primary components:

Static Whole-Body Grasping Pose Generation: Addressed with a multi-task generative model that simultaneously learns static whole-body grasping poses and the contact points between the human body and the object. This inclusion of contact data is critical in synthesizing realistic interaction.
Grasping Motion Infilling: This component bridges the initial pose and the synthesized final grasp pose using a contact-aware generative motion infilling module. It produces a spectrum of grasp-oriented motions, ensuring the variability and natural appearance of generated sequences.

Methodology

WholeGrasp-VAE: A multi-task Conditional Variational Autoencoder (CVAE) is employed, which trains on both body marker positions and contact probabilities. The approach provides consistency between the body pose and object interactions, crucial for realistic grasping synthesis.
MotionFill-VAE: To handle the in-between motion synthesis, a two-staged CVAE model is applied. The framework first predicts global trajectories via TrajFill and subsequently fills local motion details with LocalMotionFill, utilizing conditioned inputs for enhanced realism.
Contact Optimization: Throughout the process, specific optimization steps are leveraged, utilizing predicted contact maps to refine both static and dynamic components of motion, thereby reducing non-physical interpenetrations and improving contact fidelity.

Experimental Evaluation

Diversity and Realism: The evaluation metrics, including contact ratios and interpenetration volumes, affirm that SAGA can achieve significant diversity while maintaining realistic interactions with objects in the environment. The results indicate improved performance over existing solutions, such as extensions of GrabNet, especially in minimizing hand-object intersections—a common challenge in grasp synthesis.
User Studies: Perceptual assessments via Amazon Mechanical Turk indicate that users find the generated motions perceptually believable, though there's room for improvement compared to ground truth sequences from datasets like GRAB.

Implications and Future Directions

This research illustrates the potential of integrating contact-aware mechanisms into generative models for human-object interaction. The dual-stage approach of first synthesizing contact-rich endpoint poses before infilling motions offers a significant step forward in mimicking realistic human actions. Moving forward, incorporating physics-based manipulation strategies that model object affordances and interaction contexts could enhance motion realism further, especially for nuanced tasks like tool use or object manipulation post-grasp.

In summary, SAGA sets a foundation for diverse applications where animated human interactions with objects are a necessity. Future research could expand on these methods to incorporate more complex and prolonged object interactions, enhancing the life-likeness and utility of synthetic human motions in digital environments.

PDF Markdown

Related Papers

Find Related Papers