Introduction
The field of robotics has made great strides with autonomous quadcopters, which are essential in various applications requiring rapid and agile flight. Achieving time-optimal control for these quadcopters often encounters challenges such as the sim-to-real gap, which refers to the difficulty in translating learned behaviors from simulations to real-world scenarios.
Methodology
Researchers have developed an end-to-end reinforcement learning (E2E RL) system, offering direct motor commands without relying on low-level controllers. This approach includes a learned residual model and an adaptive method to adjust for modeling inaccuracies, aiming to bridge the reality gap.
The methodology consists of adopting a quadcopter model that incorporates the E2E network and training strategies, as well as an INDI network that calculates thrust and body rate commands. Each has distinct neural network architectures and inputs specific to their operation within a Markov Decision Process (MDP) frame.
Experimental Setup
The practical application of the E2E and INDI networks was tested using a Parrot Bebop 1 quadcopter within a controlled environment. This quadcopter was chosen for its unique flexible frame, presenting a non-trivial scenario for the networks to operate within. The setup used real-time computation aboard the Bebop's processor and an OptiTrack system to provide precise motion data.
Results & Discussion
The E2E approach showed significant promise. In simulations, the E2E network demonstrated a 1.39-second faster completion time over the state-of-the-art approach, with a 0.17-second lead in real-world tests. This advantage was predominantly visible during the initial lap, starting from a hover. The networks' performance converged in the following laps. While both techniques proved robust in simulations, real-world flights revealed more pronounced gaps, especially for the E2E framework, suggesting its greater sensitivity to modeling errors.
The E2E network's direct handling of motor commands and real-time adjustments presents an exciting avenue for further research to improve quadcopter performance. Refining the E2E network through offline reinforcement learning, using real-world flight data and considering other model discrepancies like battery voltage or maximum RPM variance, could lead to further performance enhancements.