Learning Agile Skills via Adversarial Imitation of Rough Partial Demonstrations (2206.11693v3)

Published 23 Jun 2022 in cs.RO, cs.AI, and cs.LG

Abstract: Learning agile skills is one of the main challenges in robotics. To this end, reinforcement learning approaches have achieved impressive results. These methods require explicit task information in terms of a reward function or an expert that can be queried in simulation to provide a target control output, which limits their applicability. In this work, we propose a generative adversarial method for inferring reward functions from partial and potentially physically incompatible demonstrations for successful skill acquirement where reference or expert demonstrations are not easily accessible. Moreover, we show that by using a Wasserstein GAN formulation and transitions from demonstrations with rough and partial information as input, we are able to extract policies that are robust and capable of imitating demonstrated behaviors. Finally, the obtained skills such as a backflip are tested on an agile quadruped robot called Solo 8 and present faithful replication of hand-held human demonstrations.

Citations (50)

View on Semantic Scholar

Summary

The paper introduces the WASABI approach that extracts robust reward signals from rough human demonstrations to train agile robotic movements.
The method utilizes a Wasserstein GAN loss to stabilize adversarial imitation learning and outperforms traditional LSGAN-based techniques.
Practical experiments on the Solo 8 robot demonstrate rapid acquisition of complex skills like backflips, highlighting effective sim-to-real transfer.

Learning Agile Skills via Adversarial Imitation of Rough Partial Demonstrations

This paper presents a novel approach to teaching legged robots such as quadrupeds how to perform complex agile movements by leveraging adversarial imitation learning. The authors propose a method named Wasserstein Adversarial Behavior Imitation (WASABI), which extracts reward functions from partial and rough human demonstrations, allowing robots to acquire dynamic skills that would otherwise require meticulously engineered reward functions.

Summary of Contributions

Adversarial Imitation Method: The WASABI approach uses an adversarial network setup to infer task reward functions from rough human demonstrations that are limited in terms of state representation and potentially incompatible with the robot's physical embodiment.
Use of Wasserstein GAN: Rather than traditional GAN losses which often suffer from vanishing gradients, the authors employ a Wasserstein GAN loss to enhance stability and convergence of the adversarial learning process. This shift allows for a more robust extraction of imitation rewards from demonstrations while mitigating mode collapse issues.
Practical Deployments: WASABI was tested on a quadrupedal robot called Solo 8, achieving various complex tasks such as backflips. These skills were learned without explicit prior information about specific task rewards, demonstrating the potential for rapid skill development and sim-to-real transfer.

Key Results and Findings

The strategies used in WASABI were empirically validated, showing that the model can generalize and achieve highly dynamic skills without engineered reward functions. Compared to the LSGAN-based approach, WASABI consistently produced more informative reward signals and achieved desired behaviors effectively.
Numerical results from experiments (such as the DTW evaluation metric) demonstrated that WASABI outperformed methods in scenarios requiring full or partial task adaptability, confirming its capability in learning from rough demonstrations.

Implications and Future Work

The introduction of WASABI has notable implications for both theoretical research and practical applications in robotics. From a practical standpoint, this method simplifies the process of teaching robots complex tasks by reducing the need for finely-tuned reward engineering, which is often not feasible in dynamic environments.

Theoretically, the work expands on existing literature by successfully applying GAN frameworks with Wasserstein losses to areas outside of purely generative tasks, showcasing their potential in robotic control and imitation learning. Future research directions could include exploring the limits of the rough demonstration paradigm by experimenting with different robot morphologies and task difficulties.

Continued work could involve improving the discriminator's understanding of physically feasible actions, making WASABI applicable to more diverse robotics settings. Additionally, exploring the integration of more sophisticated sensing and perception capabilities might further enhance performance in unstructured environments.

The findings from this paper provide a tangible contribution to the field of autonomous robotic movement, presenting methodologies that could lead to significant advancements in robotic learning frameworks, particularly in learning agile behaviors through imitation.

PDF Markdown

Related Papers

YouTube

Show All Videos