D4RL: Datasets for Deep Data-Driven Reinforcement Learning (2004.07219v4)

Published 15 Apr 2020 in cs.LG and stat.ML

Abstract: The offline reinforcement learning (RL) setting (also known as full batch RL), where a policy is learned from a static dataset, is compelling as progress enables RL methods to take advantage of large, previously-collected datasets, much like how the rise of large datasets has fueled results in supervised learning. However, existing online RL benchmarks are not tailored towards the offline setting and existing offline RL benchmarks are restricted to data generated by partially-trained agents, making progress in offline RL difficult to measure. In this work, we introduce benchmarks specifically designed for the offline setting, guided by key properties of datasets relevant to real-world applications of offline RL. With a focus on dataset collection, examples of such properties include: datasets generated via hand-designed controllers and human demonstrators, multitask datasets where an agent performs different tasks in the same environment, and datasets collected with mixtures of policies. By moving beyond simple benchmark tasks and data collected by partially-trained RL agents, we reveal important and unappreciated deficiencies of existing algorithms. To facilitate research, we have released our benchmark tasks and datasets with a comprehensive evaluation of existing algorithms, an evaluation protocol, and open-source examples. This serves as a common starting point for the community to identify shortcomings in existing offline RL methods and a collaborative route for progress in this emerging area.

Authors (5)

Justin Fu (20 papers)
Aviral Kumar (74 papers)
Ofir Nachum (64 papers)
George Tucker (45 papers)
Sergey Levine (531 papers)

Citations (1,178)

View on Semantic Scholar

Summary

The paper presents a comprehensive benchmark suite that curates diverse datasets from robotics, autonomous driving, and traffic management for offline RL.
It highlights dataset properties such as sparse rewards, suboptimal policies, and partial observability to expose current algorithmic challenges.
It proposes a standardized evaluation protocol that enables reproducible and fair comparisons among state-of-the-art offline RL methods.

Overview of "D4RL: Datasets for Deep Data-Driven Reinforcement Learning"

The paper "D4RL: Datasets for Deep Data-Driven Reinforcement Learning" aims to address challenges in the domain of offline reinforcement learning (RL) by introducing a comprehensive suite of datasets and benchmarks. These benchmarks are meticulously designed to simulate realistic, real-world applications of offline RL.

Key Contributions

The paper makes several key contributions to the field of offline RL:

Benchmark Design: The authors propose a collection of datasets derived from various domains such as robotics, autonomous driving, and traffic management. These datasets are curated to possess essential properties, such as narrow data distributions, undirected and multitask data, sparse rewards, suboptimal data, non-representable behavior policies, and partial observability.
Dataset Properties: Emphasis is placed on including datasets generated from diverse sources such as human demonstrations, hand-coded controllers, and mixtures of policies. This diversity aims to highlight deficiencies in current offline RL algorithms and push for more generalized solutions.
Evaluation Protocol: The paper introduces a standardized evaluation protocol that includes both training and evaluation tasks, promoting reproducibility and fair comparisons among different algorithms.
Open-source Release: The benchmark suite, along with a comprehensive evaluation of existing offline RL algorithms, is made publicly available. This serves as a common starting point for the research community.

Benchmark Domains and Tasks

The proposed benchmark includes a variety of domains:

Maze2D and AntMaze: These domains focus on navigation tasks. Maze2D uses a 2D agent while AntMaze uses an 8-DoF "Ant" robot. The tasks test the ability of algorithms to stitch together sub-trajectories to reach a goal, with the data generated via planners.
Gym-MuJoCo: Traditional RL benchmark tasks (e.g., Hopper, HalfCheetah, Walker2d) with datasets derived from partially-trained and random policies. Mixtures of policy data are also included to assess the robustness of algorithms to heterogeneous data.
Adroit: High-dimensional robotic manipulation tasks using a 24-DoF Shadow Hand robot, with datasets from human demonstrations, expert policies, and imitation learning.
FrankaKitchen: Tasks involving a 9-DoF Franka robot interacting with various objects in a kitchen environment. The datasets test the generalization ability of algorithms as they contain partial and undirected data.
Flow: Traffic management tasks with autonomous vehicles, involving datasets generated via intelligent driver models and random policies.
CARLA: High-fidelity autonomous driving simulator with tasks using RGB image inputs, focusing on lane following and town navigation.

Evaluation of Algorithms

The paper evaluates several state-of-the-art offline RL algorithms, including behavioral cloning (BC), policy and value regularization methods (BRAC-p and BRAC-v), bootstrapping error reduction (BEAR), advantage-weighted regression (AWR), batch-constrained Q-learning (BCQ), AlgaeDICE, and continuous random ensemble mixtures (cREM).

Findings

The evaluation results provide valuable insights:

Performance on Realistic Data: Algorithms perform well on datasets collected from RL-trained policies but struggle with more complex properties like undirected data and mixtures of policies.
Sparse Rewards: Offline RL shows promise in overcoming exploration challenges, with algorithms often outperforming online SAC on sparse reward tasks.
Generalization to Multitask Data: Tasks involving human demonstrations and undirected data, such as in FrankaKitchen, remain challenging for existing methods.

Implications and Future Work

The benchmark suite introduced in the paper sets a high bar for future research in offline RL. By highlighting the shortcomings of current algorithms with realistic and diverse datasets, it encourages the development of more robust and generalized methods. The paper also suggests the need for reliable off-policy evaluation techniques and standardization of real-world benchmarks.

In summary, "D4RL: Datasets for Deep Data-Driven Reinforcement Learning" significantly advances the field of offline RL by providing a comprehensive and challenging set of benchmarks. This work paves the way for future developments aimed at addressing real-world challenges in offline reinforcement learning.

PDF Markdown

Related Papers

YouTube

Show All Videos