A General Infrastructure and Workflow for Quadrotor Deep Reinforcement Learning and Reality Deployment (2504.15129v1)

Published 21 Apr 2025 in cs.RO, cs.AI, cs.CV, and cs.LG

Abstract: Deploying robot learning methods to a quadrotor in unstructured outdoor environments is an exciting task. Quadrotors operating in real-world environments by learning-based methods encounter several challenges: a large amount of simulator generated data required for training, strict demands for real-time processing onboard, and the sim-to-real gap caused by dynamic and noisy conditions. Current works have made a great breakthrough in applying learning-based methods to end-to-end control of quadrotors, but rarely mention the infrastructure system training from scratch and deploying to reality, which makes it difficult to reproduce methods and applications. To bridge this gap, we propose a platform that enables the seamless transfer of end-to-end deep reinforcement learning (DRL) policies. We integrate the training environment, flight dynamics control, DRL algorithms, the MAVROS middleware stack, and hardware into a comprehensive workflow and architecture that enables quadrotors' policies to be trained from scratch to real-world deployment in several minutes. Our platform provides rich types of environments including hovering, dynamic obstacle avoidance, trajectory tracking, balloon hitting, and planning in unknown environments, as a physical experiment benchmark. Through extensive empirical validation, we demonstrate the efficiency of proposed sim-to-real platform, and robust outdoor flight performance under real-world perturbations. Details can be found from our website https://emnavi.tech/AirGym/.

Summary

The paper introduces a comprehensive infrastructure for training and deploying quadrotor DRL policies, bridging the sim-to-real gap with integrated simulation, algorithms, and hardware.
Key components include a high-speed simulator (AirGym) and a real-time edge node for efficient DRL training and onboard inference.
Experimental evaluations validate the platform's efficacy for complex tasks, including high-speed obstacle avoidance and outdoor navigation in real environments.

Summary of Quadrotor Deep Reinforcement Learning Infrastructure and Workflow

The paper entitled "A General Infrastructure and Workflow for Quadrotor Deep Reinforcement Learning and Reality Deployment" presents an integrated platform for efficient training and deployment of learning-based control methods for quadrotors in real-world environments. The authors introduce a comprehensive system that fuses various elements, including a simulation environment, flight dynamics control, Deep Reinforcement Learning (DRL) algorithms, MAVROS middleware stack, and physical hardware setup. This platform streamlines the transition of DRL policies from simulation to direct application in real outdoor flights.

Overview of the System and Methodology

The proposed infrastructure encompasses distinct components aimed at overcoming the sim-to-real gap, a major challenge in deploying learning-based methods on quadrotors. Primarily, it is designed to address issues such as the need for large volumes of simulated training data, real-time processing demands on the hardware, and discrepancies between simulated models and real-world dynamic conditions. Key aspects of the system include:

Quadrotor Platform X152b: This hardware platform includes the Rockchip RK3588s processor, a PX4 autopilot, and an Intel RealSense D430 for onboard perception. This setup ensures powerful computation capacity and accurate depth sensing, supporting onboard inference and adaptation in dynamic environments.
AirGym Simulation Environment: Built upon IsaacGym, AirGym facilitates speedy training by employing vectorized environments and leveraging GPU acceleration. It hosts five DRL task types: hovering, dynamic obstacle avoidance, trajectory tracking, balloon hitting, and planning amid unknown conditions.
Parallelized Geometric Controller (rlPx4Controller): This tool mimics the PX4 control logic in simulation. It supports parallel computation across various control levels, including position (PY), linear velocity (LV), attitude angles (CTA), body rates (CTBR), and single-rotor thrust (SRT).
AirGym-Real: Deployed on the edge-side, this real-time ROS node manages sensor data and performs onboard neural inference, integrating visual-inertial pose estimation to handle state estimation without external positioning systems.
Control Bridge (control_for_gym): This middleware layer forwards decisions from the DRL models via MAVROS to the PX4 autopilot, enabling seamless command execution across different control hierarchies.

Experimental Evaluation

The authors conducted extensive experiments in both simulated and real-world scenarios, validating the efficacy of the infrastructure. For example, in dynamic obstacle avoidance tasks, the quadrotor demonstrated robust evasion capabilities against fast-moving objects with velocities up to 15 m/s. The platform also successfully tackled complex navigation and planning tasks in outdoor environments cluttered with natural obstacles.

Implications and Future Directions

The infrastructure detailed in this paper offers a robust approach to sim-to-real transfer in quadrotor applications, enhancing the reproducibility and scalability of DRL in aerial robotic systems. By providing open-source access, the work facilitates research and development within the community, potentially accelerating advancements in autonomous UAV navigation, obstacle avoidance, and precision tracking in varied operational settings.

Future developments should focus on enhancing the generalization capabilities of DRL policies to accommodate broader variations in environmental conditions and optimizing data utilization during training. Moreover, tackling computational constraints without latency sacrifices remains an important challenge to ensure real-time adaptability in adverse conditions.

In conclusion, the proposed platform marks significant progress in DRL deployment for quadrotors, offering a practical and comprehensive framework that bridges the gap between simulation accuracy and real-world applicability.