Using Parameterized Black-Box Priors to Scale Up Model-Based Policy Search for Robotics (1709.06917v2)

Published 20 Sep 2017 in cs.RO, cs.AI, cs.LG, cs.NE, and stat.ML

Abstract: The most data-efficient algorithms for reinforcement learning in robotics are model-based policy search algorithms, which alternate between learning a dynamical model of the robot and optimizing a policy to maximize the expected return given the model and its uncertainties. Among the few proposed approaches, the recently introduced Black-DROPS algorithm exploits a black-box optimization algorithm to achieve both high data-efficiency and good computation times when several cores are used; nevertheless, like all model-based policy search approaches, Black-DROPS does not scale to high dimensional state/action spaces. In this paper, we introduce a new model learning procedure in Black-DROPS that leverages parameterized black-box priors to (1) scale up to high-dimensional systems, and (2) be robust to large inaccuracies of the prior information. We demonstrate the effectiveness of our approach with the "pendubot" swing-up task in simulation and with a physical hexapod robot (48D state space, 18D action space) that has to walk forward as fast as possible. The results show that our new algorithm is more data-efficient than previous model-based policy search algorithms (with and without priors) and that it can allow a physical 6-legged robot to learn new gaits in only 16 to 30 seconds of interaction time.

Citations (43)

View on Semantic Scholar

Summary

The paper introduces parameterized black-box priors to scale model-based policy search for robotics to high-dimensional state/action spaces while handling prior inaccuracies.
The method significantly improves data efficiency compared to existing techniques, enabling a hexapod robot to learn new gaits in just 16-30 seconds.
Using parameterized priors provides a robust way to integrate simulation insights with empirical data, reducing the reality gap and accelerating complex robotics development.

Parameterized Black-Box Priors for Scaling Model-Based Policy Search in Robotics

In the field of reinforcement learning for robotics, data efficiency is a pivotal concern due to the time and resource constraints inherent in acquiring real-world data. Model-based policy search methods offer a promising avenue as they iteratively refine the model of the system's dynamics to optimize the policy for maximizing expected returns. Among existing solutions, Black-DROPS has demonstrated remarkable potential by employing a black-box optimization strategy that ensures efficient computational performance with parallel processing capabilities. However, its scalability in high-dimensional state/action spaces remains a critical challenge.

The paper proposes an innovative approach to scale Black-DROPS to high-dimensional systems using parameterized black-box priors. The primary objective is to enhance the model learning phase by incorporating black-box priors, which allows the system to (1) scale to high-dimensional state/action spaces and (2) maintain robustness in the presence of substantial inaccuracies in prior knowledge.

To assess the efficacy of the proposed method, empirical evaluations were conducted with two robotic tasks: the simulation-based pendubot swing-up task and a physical hexapod robot tasked with rapid forward locomotion. In both scenarios, the results indicated that the new algorithm surpasses existing model-based methods, including PILCO and Black-DROPS without priors, in data efficiency. Notably, the hexapod was able to adapt and learn new gaits within merely 16 to 30 seconds of interaction time, a testament to the method's practical applicability and efficiency in real-world settings.

Key numerical results highlighted the algorithm's capability to consistently resolve tasks in fewer interactions compared to traditional approaches, thereby significantly improving the data economy of learning processes in robotics. Furthermore, the adaptability of the method across varied problem setups—ranging from simulated environments to tangible robotics—illustrates its broad applicability.

The use of parameterized black-box priors in the context of this research introduces substantial theoretical implications. It provides a mechanism for seamlessly integrating simulation-based insights with empirical data to inform model updates, reducing the 'reality gap' often cited in robotics. This capability is especially critical in tasks where data collection is resource-intensive or time-constrained.

Practically, the introduction of parameterized priors into model-based policy search workflows could reshape the efficiency landscape of robotics learning. The reductions in both time and data requirements could accelerate developments in advanced robotics applications, from autonomous navigation to complex manipulative tasks that demand high-dimensional control solutions.

Moving forward, the speculative future of AI informed by this research could witness a convergence of hybrid models that amalgamate deterministic simulations with statistical learning techniques to offer robust and versatile learning frameworks. Further exploration could focus on leveraging such methodologies in diverse and dynamically changing environments, enhancing the robustness and generalization capabilities of learning agents.

This research marks a significant step towards data-efficient reinforcement learning, highlighting a path where parameterized priors may catalyze more agile and adaptable robotic systems that are capable of learning complex behaviors with minimal data interruption.

PDF Markdown

Related Papers

YouTube

Show All Videos