Motion Control in Multi-Rotor Aerial Robots Using Deep Reinforcement Learning (2502.05996v2)

Published 9 Feb 2025 in cs.RO and cs.AI

Abstract: This paper investigates the application of Deep Reinforcement (DRL) Learning to address motion control challenges in drones for additive manufacturing (AM). Drone-based additive manufacturing promises flexible and autonomous material deposition in large-scale or hazardous environments. However, achieving robust real-time control of a multi-rotor aerial robot under varying payloads and potential disturbances remains challenging. Traditional controllers like PID often require frequent parameter re-tuning, limiting their applicability in dynamic scenarios. We propose a DRL framework that learns adaptable control policies for multi-rotor drones performing waypoint navigation in AM tasks. We compare Deep Deterministic Policy Gradient (DDPG) and Twin Delayed Deep Deterministic Policy Gradient (TD3) within a curriculum learning scheme designed to handle increasing complexity. Our experiments show TD3 consistently balances training stability, accuracy, and success, particularly when mass variability is introduced. These findings provide a scalable path toward robust, autonomous drone control in additive manufacturing.

Summary

The paper introduces a DRL framework that adapts UAV motion control to variable mass and disturbances in additive manufacturing.
It compares DDPG and TD3 algorithms and uses curriculum learning to enhance training stability and precision.
The study models the UAV-AM problem as an MDP with waypoint navigation metrics, demonstrating robust performance improvements.

The paper, "Motion Control in Multi-Rotor Aerial Robots Using Deep Reinforcement Learning," presents an advanced framework aiming to enhance the motion control of multi-rotor drones employed in Additive Manufacturing (AM) by leveraging Deep Reinforcement Learning (DRL). This paper explores the integration of UAVs with AM technology, focusing on flexible, autonomous material deposition in challenging environments where precise control becomes a critical factor.

Key Contributions:

DRL Framework for UAV Control:
- The paper develops a DRL system that adapts dynamically to the variability in mass and external disturbances encountered during AM tasks. Traditional control mechanisms, such as PID, are insufficient due to their static nature, especially under dynamically changing conditions.
Algorithm Comparison and Curriculum Learning:
- The paper evaluates two DRL algorithms: Deep Deterministic Policy Gradient (DDPG) and Twin Delayed Deep Deterministic Policy Gradient (TD3). The integration of curriculum learning plays a crucial role in enhancing the agent's learning process by gradually increasing task complexity, leading to robust and scalable control policies. The TD3 algorithm, in particular, demonstrates superior performance by maintaining training stability and precision, especially when faced with varying payload conditions.
Modeling as a Markov Decision Process (MDP):
- The UAV-AM control problem is structured as a MDP with clearly defined state and action spaces, rewards, and constraints. Critical state variables include accelerations in all three axes, aiding the agent in inferring mass changes and executing accurate thrust adaptations.
Waypoint-Based Navigation:
- This methodology simplifies the control problem, allowing discrete evaluations through predefined waypoints. The effectiveness of control policies is assessed based on metrics such as positional error, precision, and average cumulative reward.
Curriculum Learning for Enhanced Training:
- Employing curriculum learning, the agent initially tackles simpler tasks before transitioning to more complex scenarios involving dynamic waypoints, variable payloads, and external disturbances. This approach significantly stabilizes training and enhances adaptability.
Performance Metrics:
- Detailed experiments and analyses were conducted to compare the performance of DDPG and TD3. Metrics such as average cumulative rewards, positional accuracy, precision, and success ratios were used to measure effectiveness. Notably, TD3 outperformed DDPG, achieving a precision level and success rate significantly higher than its counterpart.

Experiments and Results:

Simulation Environment: MATLAB’s Simulink, along with RL and UAV toolboxes, was used to simulate UAV dynamics in AM. The environment operates in a continuous loop to provide real-time feedback for the RL agent.
Task Complexity Management: Experiments included testing the agent’s performance in navigational tasks over fixed waypoints, adapting to dynamic mass changes along the flight path due to material deposition activities. Introducing accelerations into the observation space allowed the TD3 agent to adapt its thrust management strategy effectively under these variable mass scenarios.
Convergence and Robustness: Curriculum learning vastly improved the convergence speed and robustness of the learned policies in complex environments. The curriculum-trained TD3 showed higher cumulative rewards and success rates in waypoint navigation tasks, substantiating its superiority in handling complex UAV navigation with dynamic conditions.

In conclusion, the research provides a comprehensive analysis of controlling UAVs in the AM sector. It highlights the efficacy of using sophisticated DRL techniques to create a scalable, autonomous control system capable of managing dynamic, real-time demands posed by challenging aerial construction and repair tasks. The proposed TD3 framework, combined with curriculum learning, presents a significant advancement toward achieving robust UAV control in real-world applications involving AM. Future work is suggested towards implementing these findings in physical UAVs and scaling the solution to broader AM applications.

PDF Markdown

Motion Control in Multi-Rotor Aerial Robots Using Deep Reinforcement Learning (2502.05996v2)

Summary

Key Contributions:

Experiments and Results:

Related Papers