Learning to Fly in Seconds (2311.13081v2)

Published 22 Nov 2023 in cs.RO, cs.AI, cs.LG, cs.SY, and eess.SY

Abstract: Learning-based methods, particularly Reinforcement Learning (RL), hold great promise for streamlining deployment, enhancing performance, and achieving generalization in the control of autonomous multirotor aerial vehicles. Deep RL has been able to control complex systems with impressive fidelity and agility in simulation but the simulation-to-reality transfer often brings a hard-to-bridge reality gap. Moreover, RL is commonly plagued by prohibitively long training times. In this work, we propose a novel asymmetric actor-critic-based architecture coupled with a highly reliable RL-based training paradigm for end-to-end quadrotor control. We show how curriculum learning and a highly optimized simulator enhance sample complexity and lead to fast training times. To precisely discuss the challenges related to low-level/end-to-end multirotor control, we also introduce a taxonomy that classifies the existing levels of control abstractions as well as non-linearities and domain parameters. Our framework enables Simulation-to-Reality (Sim2Real) transfer for direct RPM control after only 18 seconds of training on a consumer-grade laptop as well as its deployment on microcontrollers to control a multirotor under real-time guarantees. Finally, our solution exhibits competitive performance in trajectory tracking, as demonstrated through various experimental comparisons with existing state-of-the-art control solutions using a real Crazyflie nano quadrotor. We open source the code including a very fast multirotor dynamics simulator that can simulate about 5 months of flight per second on a laptop GPU. The fast training times and deployment to a cheap, off-the-shelf quadrotor lower the barriers to entry and help democratize the research and development of these systems.

PDF Abstract

Learning to Fly in Seconds

The paper "Learning to Fly in Seconds" by Jonas Eschmann, Dario Albani, and Giuseppe Loianno presents a novel approach to the rapid training of reinforcement learning (RL)-based controllers for quadrotors. The authors propose an asymmetric actor-critic-based architecture combined with a highly optimized simulator, which enables end-to-end control of quadrotors using RPM outputs.

Overview

Quadrotors, a type of multi-rotor UAV, present significant control challenges due to their complex dynamics and the need for precise motor control. Traditional control methods require significant domain expertise and are often platform-specific. By leveraging RL, the authors aim to streamline the deployment and enhance the performance of quadrotor control systems. Notably, their approach focuses on addressing the simulation-to-reality (sim2real) gap and reducing the traditionally long training times associated with RL.

Methodology

The core of the proposed methodology is an RL training pipeline that directly maps quadrotor states to motor RPM commands. Key components of their approach include:

Asymmetric Actor-Critic Architecture: The critic has access to privileged information, including ground truth state and disturbance parameters, which is not available to the actor. This aids in more stable and reliable training.
Optimized Simulator: The authors developed a highly efficient simulator that can execute dynamics simulations at approximately 1284 million steps per second on a consumer-grade GPU. This is a significant improvement compared to existing simulators such as Flightmare.
Curriculum Learning: To improve sample efficiency, the training employs a curriculum that gradually increases the complexity of the environment and the penalties in the reward function.
Action History for Partial Observability: The actor receives a history of previous actions to mitigate the impact of motor delays and partial observability, enhancing the policy's robustness.

Results

The proposed framework achieves remarkable numerical results. The authors demonstrate their controller's ability to transfer from simulation to real hardware in just 18 seconds of training using a consumer-grade laptop. They conducted extensive experiments using a Crazyflie nano quadrotor, revealing competitive performance in trajectory tracking compared to state-of-the-art control methods.

Implications

Practical Implications:

The approach dramatically lowers the barriers to entry for developing and deploying RL-based quadrotor controllers.
It facilitates rapid prototyping and deployment on consumer-grade hardware without the need for large-scale computational resources.

Theoretical Implications:

The paper extends the understanding of how RL can be applied to complex, real-world control problems with low-level, direct motor outputs.
The use of asymmetric actor-critic structures and curriculum learning demonstrates effective strategies for dealing with partial observability and sample inefficiency in RL.

Future Work

The authors suggest several avenues for future research:

Extending the framework to enable adaptive control, possibly through meta-RL, to handle changing environment and system parameters such as battery levels and wind conditions.
Improving the robustness and performance of the trained policies through automatic hyperparameter optimization.

Conclusion

This paper offers a rigorous and detailed exploration of applying RL to quadrotor control, providing a practical methodology that significantly reduces training time and enhances the feasibility of sim2real transfer. The open-source nature of their code and simulator promises to democratize research and accelerate advancements in autonomous aerial vehicle control.