Learning Deep Control Policies for Autonomous Aerial Vehicles with MPC-Guided Policy Search (1509.06791v2)

Published 22 Sep 2015 in cs.LG and cs.RO

Abstract: Model predictive control (MPC) is an effective method for controlling robotic systems, particularly autonomous aerial vehicles such as quadcopters. However, application of MPC can be computationally demanding, and typically requires estimating the state of the system, which can be challenging in complex, unstructured environments. Reinforcement learning can in principle forego the need for explicit state estimation and acquire a policy that directly maps sensor readings to actions, but is difficult to apply to unstable systems that are liable to fail catastrophically during training before an effective policy has been found. We propose to combine MPC with reinforcement learning in the framework of guided policy search, where MPC is used to generate data at training time, under full state observations provided by an instrumented training environment. This data is used to train a deep neural network policy, which is allowed to access only the raw observations from the vehicle's onboard sensors. After training, the neural network policy can successfully control the robot without knowledge of the full state, and at a fraction of the computational cost of MPC. We evaluate our method by learning obstacle avoidance policies for a simulated quadrotor, using simulated onboard sensors and no explicit state estimation at test time.

Authors (4)

Tianhao Zhang (29 papers)
Gregory Kahn (16 papers)
Sergey Levine (531 papers)
Pieter Abbeel (372 papers)

Citations (419)

View on Semantic Scholar

Summary

Exploring MPC-Guided Policy Search for Autonomous Aerial Vehicles

The paper entitled "Learning Deep Control Policies for Autonomous Aerial Vehicles with MPC-Guided Policy Search" presents a methodology integrating Model Predictive Control (MPC) with Reinforcement Learning (RL) to train deep neural network policies for autonomous aerial vehicles, specifically focusing on quadcopters. This approach seeks to overcome the limitations of traditional MPC methods and RL's tendency towards instability during training processes, especially in dynamic and unstructured environments.

Methodological Synthesis

The research introduces an off-policy guided policy search framework to synthesize MPC's data efficiency and reinforcement learning's adaptability. In essence, the method employs MPC to generate trajectory-centric data based on full state observations in an instrumented training environment. This data facilitates supervised learning of a deep neural network policy capable of operating in real-world scenarios relying solely on raw sensory inputs, eliminating the need for explicit state estimation, which is often computationally expensive and potentially unreliable in dynamic situations.

The technique fundamentally transforms RL training into a supervised learning problem, allowing the neural network policy to be optimized without relying solely on the untested policy during data collection. MPC acts as a stabilizing agent, guiding the search and ensuring data collected reflects optimal decisions, consequently avoiding catastrophic failures which could occur during training solely with reinforcement learning techniques.

Empirical Evaluation

The methodology's effectiveness was validated through simulations involving quadrotor tasks, particularly obstacle avoidance. The results show that the MPC-guided policy search reliably trains policies without catastrophic failures during the learning phase. Moreover, the trained policies displayed robustness, successfully navigating complex obstacle courses even under varying model perturbations including rotor biases and mass changes.

Notably, the developed neural network policies significantly reduced computational costs compared to traditional MPC implementations, while maintaining or exceeding performance benchmarks in testing environments with different model inaccuracies.

Implications and Future Directions

The implications of this research are multifaceted, enhancing both theoretical understandings and practical implementations of autonomous control systems. By integrating MPC with a hybrid RL methodology, the framework exhibits computational efficiency improvements alongside robust real-world applicability. This approach is particularly relevant for applications where the onboard computational capacity is limited, yet accurate and timely decision-making is critical.

On a theoretical level, this paper contributes to the ongoing discourse on integrating model-based planning with deep learning frameworks, providing a structured pathway to leverage the strengths of both paradigms in solving complex dynamic control problems.

Looking forward, potential avenues for further exploration include extending this methodology to other robotic systems and environments. The framework could potentially be enhanced by incorporating adaptive or model-agnostic MPC mechanisms, which could further improve the flexibility and robustness of training under varying real-world conditions. Additionally, integrating explicit state estimation within this guided policy search framework could broaden its applicability to tasks where full state observations are challenging to obtain.

Conclusion

In summary, this paper introduces a viable and efficient strategy for learning control policies that leverage the strengths of both Model Predictive Control and Reinforcement Learning in supervised learning contexts. The empirical results underscore the potential of this integrated approach in developing robust, computationally efficient control policies for autonomous aerial systems, specifically under varying environmental conditions and system dynamics. By avoiding training phase failures and reducing computational overhead, this method positions itself as a compelling solution for real-world autonomous flight control tasks.

PDF Markdown