Benchmarking Model-Based Reinforcement Learning (1907.02057v1)

Published 3 Jul 2019 in cs.LG, cs.AI, cs.RO, and stat.ML

Abstract: Model-based reinforcement learning (MBRL) is widely seen as having the potential to be significantly more sample efficient than model-free RL. However, research in model-based RL has not been very standardized. It is fairly common for authors to experiment with self-designed environments, and there are several separate lines of research, which are sometimes closed-sourced or not reproducible. Accordingly, it is an open question how these various existing MBRL algorithms perform relative to each other. To facilitate research in MBRL, in this paper we gather a wide collection of MBRL algorithms and propose over 18 benchmarking environments specially designed for MBRL. We benchmark these algorithms with unified problem settings, including noisy environments. Beyond cataloguing performance, we explore and unify the underlying algorithmic differences across MBRL algorithms. We characterize three key research challenges for future MBRL research: the dynamics bottleneck, the planning horizon dilemma, and the early-termination dilemma. Finally, to maximally facilitate future research on MBRL, we open-source our benchmark in http://www.cs.toronto.edu/~tingwuwang/mbrl.html.

Authors (10)

Tingwu Wang (9 papers)
Xuchan Bao (9 papers)
Ignasi Clavera (11 papers)
Jerrick Hoang (5 papers)
Yeming Wen (14 papers)
Eric Langlois (3 papers)
Shunshi Zhang (1 paper)
Guodong Zhang (41 papers)
Pieter Abbeel (372 papers)
Jimmy Ba (55 papers)

Citations (350)

View on Semantic Scholar

Summary

The paper introduces a unified benchmark that standardizes experimental setups for MBRL, enabling clear comparisons across diverse algorithms.
It evaluates 11 model-based versus 4 model-free RL algorithms in 18 environments, revealing performance trade-offs between short and extended training horizons.
The analysis identifies key bottlenecks such as dynamics modeling errors, planning horizon trade-offs, and the drawbacks of early termination in MBRL.

Benchmarking Model-Based Reinforcement Learning: A Summary

Model-based Reinforcement Learning (MBRL) is lauded for its potential to enhance sample efficiency when compared to Model-Free Reinforcement Learning (MFRL). However, disparate experimental setups and challenges in reproducibility have significantly hindered the ability to consistently evaluate diverse MBRL algorithms. This paper seeks to address these problems through the establishment of a unified benchmark, which encompasses 11 MBRL algorithms and 4 MFRL algorithms across 18 environments crafted specifically for MBRL evaluation. By standardizing problem settings, including those in noisy environments, this benchmark aspires to create a cohesive framework that advances MBRL research.

Key Findings

Algorithmic Evaluation:
- The paper comprehensively evaluates algorithms such as Dyna-Style Algorithms (e.g., ME-TRPO, MB-MPO), Policy Search with Backpropagation through Time (e.g., PILCO, iLQG), and Shooting Algorithms (e.g., RS, PETS-CEM).
- Findings suggest that shooting algorithms like PETS-CEM generally outperform others in environments with stochasticity, reflecting the importance of employing uncertainty-aware dynamics.
Performance Metrics:
- Across shorter evaluation horizons (200k time-steps), MBRL algorithms display performances comparable to the best MFRL counterparts like SAC and TD3. This is especially evident in less complex environments.
- However, when extended to 1 million time-steps, a persistent performance gap remains, likely due to the compounding errors in the learned dynamics model, referred to as the "dynamics bottleneck."
Challenges Highlighted:
- The paper identifies critical bottlenecks including the dynamics bottleneck, planning horizon dilemma, and early termination dilemma. The dynamics bottleneck highlights the stagnation in performance despite increased data availability for model learning.
- The planning horizon dilemma reveals a nuanced trade-off, where deeper planning horizons can exacerbate errors due to the curse of dimensionality, yet shallow horizons may not suffice for complex environments.
- Early termination techniques, beneficial in MFRL for improving exploration, are shown to degrade MBRL performance significantly.

Implications of Findings

The benchmark provides key insights that are critical for addressing the existing limitations in MBRL research. By identifying these bottlenecks, the paper lays down a path for enhanced algorithmic development through collaborative benchmarking practices. The elucidated performance plateau underscores the need for improvements in dynamics modeling, particularly in terms of better management of modeling errors over longer planning horizons. Furthermore, the issues concerning early termination necessitate the exploration of novel strategies that might integrate termination prediction uncertainties more adeptly.

Future Prospects

Advancing MBRL methodologies will likely involve the development of hybrid solutions that amalgamate the robustness of MFRL with the efficiency of MBRL. Given the performance challenges associated with early termination and extended horizons, the integration of adaptive planning mechanisms and more efficient gradient-based policy updates could be promising pathways.

The open-source nature of this benchmark allows for extensive community collaboration and could drive innovation by encouraging the discovery of new insights into the trade-offs related to model accuracy and policy flexibility in MBRL scenarios.

In conclusion, this paper represents a pivotal step in standardizing MBRL research, establishing a solid foundation for evaluating and enhancing model-based algorithms, and forging a clear trajectory for future endeavors in this dynamic field.

PDF Markdown