Control of a Quadrotor with Reinforcement Learning (1707.05110v1)

Published 17 Jul 2017 in cs.RO

Abstract: In this paper, we present a method to control a quadrotor with a neural network trained using reinforcement learning techniques. With reinforcement learning, a common network can be trained to directly map state to actuator command making any predefined control structure obsolete for training. Moreover, we present a new learning algorithm which differs from the existing ones in certain aspects. Our algorithm is conservative but stable for complicated tasks. We found that it is more applicable to controlling a quadrotor than existing algorithms. We demonstrate the performance of the trained policy both in simulation and with a real quadrotor. Experiments show that our policy network can react to step response relatively accurately. With the same policy, we also demonstrate that we can stabilize the quadrotor in the air even under very harsh initialization (manually throwing it upside-down in the air with an initial velocity of 5 m/s). Computation time of evaluating the policy is only 7 {\mu}s per time step which is two orders of magnitude less than common trajectory optimization algorithms with an approximated model.

Authors (4)

Jemin Hwangbo (20 papers)
Inkyu Sa (24 papers)
Roland Siegwart (236 papers)
Marco Hutter (165 papers)

Citations (461)

View on Semantic Scholar

Summary

Control of a Quadrotor with Reinforcement Learning

The paper "Control of a Quadrotor with Reinforcement Learning" presents a significant advancement in employing reinforcement learning (RL) techniques to accomplish control tasks for quadrotors. Under the authorship of Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and Marco Hutter, this research demonstrates the practical application of RL algorithms to bridge the gap between traditional predefined control structures and neural networks capable of directly associating states with actuator commands. The proposed approach is implemented using a novel deterministic on-policy learning algorithm, designed to enhance stability in complex missions without relying on predefined control frameworks.

Research Summary

The research explores the potential of a neural network policy to map sensor data directly to rotor thrust outputs, effectively bypassing the need for conventional control architectures. The neural network is trained using reinforcement learning in a simulated environment, and its robustness is validated through experimental tests on an actual quadrotor. The novelty of the proposed algorithm lies in its conservatively-designed deterministic on-policy method, which proves more suited for quadrotor applications compared to existing RL algorithms.

An essential component of this paper is the use of a policy that performs with a computation time of only 7 microseconds per timestep – showcasing a two-order magnitude benefit over traditional trajectory optimization approaches that require model approximation. This significant reduction in computational overhead offers a substantial advantage, enabling more room for other onboard processes such as state estimation and object detection.

Methodology

The research framework leverages deterministic policy optimization with natural gradient descent, which provides several benefits over stochastic policy methods. By focusing on low-variance on-policy sample evaluations, the authors achieve a stable and computationally-efficient learning process. Furthermore, the learning algorithm uses novel exploration strategies to ensure robust sample collection, contributing to the reliable performance of the policy on diverse initial conditions, including extreme configurations like being thrown upside down.

The networks employed for the policy and value functions have specific architectures favoring stability and general utility across a range of quadrotor configurations without tedious parameter optimization during training. Through careful evaluations of various parameters and constraints, the paper presents a comprehensive evaluation of learning outcomes through metrics such as trajectory precision and recovery stability.

Results and Implications

The experimental validation of the designed RL approach includes simulated and real-world waypoint tracking and the capability to recover autonomously from adverse initial conditions, like launches in destabilized configurations. These tests highlight the practical potential of machine-learned policies, presenting opportunities for further improvements in areas such as adaptation to environmental disturbances and dynamic parameter tuning. The results efficaciously underscore the viability of machine learning methodologies to deliver performant control solutions with reduced human intervention and controller-specific tuning.

Moreover, the ability of RL-based controllers to potentially replace and even outperform existing traditional control mechanisms signals a shift in the robotic domain toward more autonomy and less reliance on explicit dynamic modeling. This crucial shift paves the way for broader applications in complex aerial robotics tasks and beyond.

Future Directions

The paper posits future work in enhancing the simulation environment for more accurate system modeling and incorporating RNNs to adapt to model inaccuracies dynamically. Additionally, exploiting transfer learning techniques could further refine quadrotor performance and adaptability when faced with previously unknown dynamic challenges or environmental perturbations.

This paper vividly elucidates the dynamic control capabilities achievable through advanced RL techniques, providing a pivotal point for subsequent research into autonomous control systems across various domains.

PDF Markdown

Related Papers

Find Related Papers

YouTube

Show All Videos