Path Integral Guided Policy Search (1610.00529v2)

Published 3 Oct 2016 in cs.RO and cs.LG

Abstract: We present a policy search method for learning complex feedback control policies that map from high-dimensional sensory inputs to motor torques, for manipulation tasks with discontinuous contact dynamics. We build on a prior technique called guided policy search (GPS), which iteratively optimizes a set of local policies for specific instances of a task, and uses these to train a complex, high-dimensional global policy that generalizes across task instances. We extend GPS in the following ways: (1) we propose the use of a model-free local optimizer based on path integral stochastic optimal control (PI2), which enables us to learn local policies for tasks with highly discontinuous contact dynamics; and (2) we enable GPS to train on a new set of task instances in every iteration by using on-policy sampling: this increases the diversity of the instances that the policy is trained on, and is crucial for achieving good generalization. We show that these contributions enable us to learn deep neural network policies that can directly perform torque control from visual input. We validate the method on a challenging door opening task and a pick-and-place task, and we demonstrate that our approach substantially outperforms the prior LQR-based local policy optimizer on these tasks. Furthermore, we show that on-policy sampling significantly increases the generalization ability of these policies.

Citations (147)

View on Semantic Scholar

Summary

The paper introduces a novel integration of path integral stochastic optimal control into the guided policy search framework for handling discontinuous contact dynamics.
It employs on-policy sampling and trajectory-centric learning to efficiently generate local policies from high-dimensional sensory inputs.
Experimental results on door opening and pick-and-place tasks show success rates of 93.3% and 86.7%, underscoring improved performance and generalization.

Path Integral Guided Policy Search

The paper "Path Integral Guided Policy Search" presents a method for learning complex feedback control policies for robotic manipulation tasks that require mapping from high-dimensional sensory inputs, like visual data, to motor torques. The focus of the work is on tasks characterized by discontinuous contact dynamics, where traditional methods may struggle. The paper builds on the guided policy search (GPS) framework and introduces several improvements that enhance its applicability and effectiveness in challenging real-world environments.

The core of this work is an extension to the GPS framework through the integration of path integral stochastic optimal control (PI²) as a model-free local optimizer. This approach enables the learning of local policies for tasks with highly discontinuous contact dynamics, contrary to prior reliance on smooth models like LQR which can falter in the presence of such discontinuities. By employing PI², the mechanism to improve local policies leverages trajectory-centric learning with stochastic search, allowing the method to excel in environments with non-differentiable cost structures.

The second enhancement is incorporating on-policy sampling, which allows for training on a new set of task instances every iteration. This is an advancement over the traditional GPS approach, which typically trains on fixed instances across iterations. The on-policy sampling increases the diversity of the experiences available during training, thereby improving the generalization capacity of the global policy that is learned.

The practical contributions of the paper are validated through experimental results on two robotic tasks: door opening and pick-and-place operations. The paper reports superior performance of its local policy optimizer over traditional LQR-based methods, with the proposed approach demonstrating higher success rates and improved task generalization. For instance, in the door opening task, PI² successfully adapted to various door poses and orientations without requiring prior local policy sampling, achieving a success rate of 93.3% on novel instances. In the pick-and-place task, the method attained an 86.7% success rate across different bottle poses and orientations after training.

This work has significant implications for the design of robotic systems capable of performing complex manipulation tasks in real-world environments. By enabling effective learning from high-dimensional sensory inputs, such as vision, the approach could be pivotal in enhancing the autonomy and adaptability of robots in unstructured settings.

Looking forward, the paper posits potential future directions, such as integrating the strengths of PI² and LQR-based methods to further optimize trajectories in environments where cost and dynamics are highly irregular. Moreover, future research might explore reducing the reliance on human-provided demonstrations, possibly through more advanced stochastic search techniques or enhanced exploration strategies.

In conclusion, "Path Integral Guided Policy Search" provides a compelling case for the utilization of model-free stochastic optimization within the GPS framework, marking a significant step forward in learning nonlinear policies for dynamic and discontinuous robotic tasks.

PDF Markdown

Related Papers

YouTube

Show All Videos