panda-gym: Open-source goal-conditioned environments for robotic learning (2106.13687v2)

Published 25 Jun 2021 in cs.LG

Abstract: This paper presents panda-gym, a set of Reinforcement Learning (RL) environments for the Franka Emika Panda robot integrated with OpenAI Gym. Five tasks are included: reach, push, slide, pick & place and stack. They all follow a Multi-Goal RL framework, allowing to use goal-oriented RL algorithms. To foster open-research, we chose to use the open-source physics engine PyBullet. The implementation chosen for this package allows to define very easily new tasks or new robots. This paper also presents a baseline of results obtained with state-of-the-art model-free off-policy algorithms. panda-gym is open-source and freely available at https://github.com/qgallouedec/panda-gym.

Authors (4)

Quentin Gallouédec (5 papers)
Nicolas Cazin (1 paper)
Emmanuel Dellandréa (13 papers)
Liming Chen (102 papers)

Citations (58)

View on Semantic Scholar

Summary

Overview of "panda-gym: Open-source goal-conditioned environments for robotic learning"

Introduction

The paper introduces "panda-gym," a collection of goal-conditioned reinforcement learning (RL) environments designed for the Franka Emika Panda robot. By integrating these environments with OpenAI Gym and employing the PyBullet physics engine, the authors aim to cater to the needs of robotic learning research, particularly in manipulation tasks where reward functions are sparse. They emphasize open-source development to foster collaborative research and ease the introduction of new tasks and robots.

Environment Design

The environments are meticulously crafted around the Franka Emika Panda robotic arm. This widely recognized 7-DOF robot, equipped with a parallel finger gripper, is simulated using PyBullet, which ensures open-source access and competitive simulation performance. The authors have modeled their environments on those by Plappert et al. and Andrychowicz et al., extending them with an additional stacking task. Each task follows a Multi-Goal RL framework, generating randomized goals for each episode and augmenting observations with desired and achieved goals.

Task Specifications

There are five primary tasks:

PandaReach-v1: The gripper must reach a randomly generated target position.
PandaPush-v1: A cube, initially on a table, is to be pushed to a target position on the table.
PandaSlide-v1: A flat cylinder must be slid to its target, requiring an impulse.
PandaPickAndPlace-v1: A cube needs to be transported to a position above the table.
PandaStack-v1: The task involves stacking two cubes in a specific order.

Each task features a specially defined observation and action space, accommodating the unique requirements of RL algorithms.

Observational and Action Dynamics

The observation space varies per task, consistently including the gripper’s position and velocity. For tasks involving objects, additional data like position and orientation are provided. The action space varies accordingly, primarily focusing on gripper movement and finger manipulation, with episodes lasting a few seconds to accommodate task complexity.

Reward Structuring

Two reward structures are explored: sparse (a binary completion check) and dense (inverse distance to goal). Sparse reward functions offer simplicity, while dense functions are meticulously crafted to handle tasks with multiple criteria, presenting hyperparameter tuning challenges.

Design Considerations

The modular architecture separates task functionality from robotic control, enhancing adaptability for new robots and tasks. This design fosters efficient learning across computational setups, boasting a 9.2% performance advantage over MuJoCo in terms of simulation speed.

Experimental Outcomes

The paper provides baseline performance analyses using off-policy algorithms such as DDPG, SAC, and TD3, alongside Hindsight Experience Replay (HER). Results demonstrate task solvability with DDPG achieving a 100% success rate in simple tasks like reach and push after a few thousand timesteps, though more complex tasks like stacking remained unsolved within the given timesteps. The ablation studies on HER and double- $Q$ tricks revealed insights into reward schemes and algorithmic settings that impact learning efficiency.

Conclusions and Future Directions

The paper concludes by affirming panda-gym’s utility in advancing RL-based robotic research. With five tasks already available, the framework’s modularity invites the integration of new, realistic tasks such as peg-in-hole or cube flipping. Future plans include joint-level robot control and multimodal observation integration, which could significantly enhance real-world applicability.

In summary, "panda-gym" represents a valuable toolset for the AI research community, providing a robust platform for experimenting with and evaluating RL algorithms in robotic manipulation tasks. Its open-source nature and flexible design are likely to stimulate broader investigation and innovation in related fields.

PDF Markdown