\LARGE GMP$^{3}$: Learning-Driven, Bellman-Guided Trajectory Planning for UAVs in Real-Time on SE(3) (2509.21264v1)

Published 25 Sep 2025 in cs.RO

Abstract: We propose $\text{GMP}^{3}$, a multiphase global path planning framework that generates dynamically feasible three-dimensional trajectories for unmanned aerial vehicles (UAVs) operating in cluttered environments. The framework extends traditional path planning from Euclidean position spaces to the Lie group $\mathrm{SE}(3)$, allowing joint learning of translational motion and rotational dynamics. A modified Bellman-based operator is introduced to support reinforcement learning (RL) policy updates while leveraging prior trajectory information for improved convergence. $\text{GMP}^{3}$ is designed as a distributed framework in which agents influence each other and share policy information along the trajectory: each agent refines its assigned segment and shares with its neighbors via a consensus-based scheme, enabling cooperative policy updates and convergence toward a path shaped globally even under kinematic constraints. We also propose DroneManager, a modular ground control software that interfaces the planner with real UAV platforms via the MAVLink protocol, supporting real-time deployment and feedback. Simulation studies and indoor flight experiments validate the effectiveness of the proposed method in constrained 3D environments, demonstrating reliable obstacle avoidance and smooth, feasible trajectories across both position and orientation. The open-source implementation is available at https://github.com/Domattee/DroneManager

Summary

The paper introduces a distributed reinforcement learning framework that uses a modified Bellman operator for real-time 6-DoF UAV trajectory planning over SE(3).
It leverages consensus-based multi-agent policy updates and various gradient optimizers, with RMSProp achieving the fastest, most stable convergence.
The framework generates collision-free, smooth trajectories in complex 3D environments and is validated through simulation and real-world indoor experiments.

GMP $^{3}$ : Learning-Driven, Bellman-Guided Trajectory Planning for UAVs in Real-Time on SE(3)

Introduction and Motivation

GMP $^{3}$ introduces a distributed, reinforcement learning-based trajectory planning framework for UAVs operating in cluttered 3D environments, extending the planning domain from Euclidean position spaces to the Lie group $\mathrm{SE}(3)$ . This enables joint optimization of both translational and rotational dynamics, supporting full six-degree-of-freedom (6-DoF) motion planning. The framework leverages a multi-agent, multi-phase approach, where agents collaboratively refine trajectory segments and share policy information via a consensus protocol, facilitating global path shaping under kinematic constraints. The integration of a modified Bellman operator and influence-aware policy update mechanisms addresses the challenges of convergence and adaptability in dynamic or partially known environments.

Figure 1: 3D visualization of the GMP $^{3}$ framework, showing original and perturbed trajectories, agent positions, and obstacles under policy $\pi$ .

Formal Problem Statement and SE(3) Representation

The trajectory planning problem is formulated as a Markov Decision Process (MDP) over $\mathrm{SE}(3)$ , with the state space comprising rigid-body poses and the action space defined by body-frame twists in the Lie algebra $\mathfrak{se}(3)$ . Each agent $A_i$ is represented by a pose $\mathbf{T}_{t,i} \in \mathrm{SE}(3)$ , and actions are expressed as $\boldsymbol{\xi}_{t,i} = [\boldsymbol{v}_{t,i}, \boldsymbol{\omega}_{t,i}]^\top \in \mathbb{R}^6$ , where $\boldsymbol{v}_{t,i}$ and $\boldsymbol{\omega}_{t,i}$ denote linear and angular velocity components, respectively. The deterministic kinematics are governed by:

$\mathbf{T}_{t+1,i} = \mathbf{T}_{t,i} \exp(\Delta t\,\widehat{\boldsymbol{\xi}_{t,i}})$

This formulation enables direct optimization of both position and orientation, capturing the full dynamics required for realistic UAV motion.

Reinforcement Learning-Based Policy Update

GMP $^{3}$ employs a distributed RL approach, where each agent learns a local policy $\pi_i$ mapping the extended state to a twist vector. The influence-aware policy update rule integrates local gradient descent, attraction to personal and global best policies, and consensus alignment with neighbors:

$\pi_{t+1,i} = \pi_{t,i} - \alpha \nabla_{\pi_i} \mathcal{L}_{\mathrm{SE(3)}}(\mathcal{S}_t, \pi_t(\mathcal{S}_t)) - \beta_1 (\pi_{t,i} - \pi^l_i) - \beta_2 (\pi_{t,i} - \pi^g) - \sum_{j \in \mathcal{N}_i} w_{ij} (\pi_{t,i} - \pi_{t,j})$

This update structure ensures coordinated learning and global trajectory feasibility, with the loss function penalizing translational and rotational non-smoothness as well as obstacle proximity.

Figure 2: Without influence-aware update.

Figure 3: A screenshot of DroneManager in use, showing simultaneous control of multiple UAVs.

Optimization Strategies

The framework supports several gradient-based optimizers, including Momentum Gradient Descent (MGD), AdaGrad, RMSProp, AdaDelta, and Adam. Each optimizer refines the policy update direction, balancing convergence speed and stability. Empirical results indicate that RMSProp achieves the fastest and most stable convergence, with minimal obstacle violation and smooth trajectory profiles. Adam offers comparable performance but with slightly higher final loss, while MGD and AdaGrad exhibit slower convergence and increased oscillations in orientation.

Figure 4: Software architecture of DroneManager, highlighting core, UI, and drone-specific modules.

Simulation and Experimental Validation

Simulations in 3D environments with static obstacles demonstrate that GMP $^{3}$ consistently generates dynamically feasible, collision-free trajectories with bounded velocity and smooth orientation transitions. The influence-aware update mechanism further stabilizes learning and improves path quality in obstacle-rich scenarios. Real-world indoor flight experiments, conducted using the custom DroneManager software interfaced via MAVLink, validate the practical viability of the approach. The system supports real-time trajectory generation, execution, and safety enforcement, with modular plugin-based extensibility.

Figure 5: Experimental setup for evaluating GMP $^{3}$ in a real-world indoor environment.

Figure 6: Extracted onboard flight data showing obstacles, planned and actual UAV trajectory, and time evolution of position and velocity.

Practical Implications and Theoretical Impact

GMP $^{3}$ advances the state of the art in UAV trajectory planning by enabling distributed, learning-driven optimization over $\mathrm{SE}(3)$ , supporting full-body motion planning under real-time constraints. The consensus-based multi-agent structure facilitates scalable deployment in multi-UAV scenarios, while the modular DroneManager software provides a robust interface for integration with physical platforms. The framework's ability to generate smooth, feasible trajectories with strong safety guarantees has direct implications for autonomous navigation in cluttered, dynamic, or safety-critical domains.

Theoretically, the extension of RL-based planning to $\mathrm{SE}(3)$ and the introduction of consensus-aware Bellman operators open new avenues for distributed control and cooperative learning in high-dimensional, non-Euclidean spaces. The influence-aware update mechanism provides a principled approach to balancing local adaptation and global alignment, with potential applications in decentralized multi-agent systems and formation control.

Future Directions

Potential future developments include:

Extension to coordinated multi-agent formation control in $\mathrm{SE}(3)$ , leveraging decentralized communication and adaptive leader-follower strategies.
Integration of online mapping and perception modules for operation in partially known or rapidly evolving environments.
Exploration of hierarchical planning architectures combining GMP $^{3}$ with local reactive controllers for enhanced robustness.
Benchmarking against state-of-the-art planners in large-scale outdoor scenarios and under dynamic obstacle conditions.

Conclusion

GMP $^{3}$ provides a comprehensive, learning-driven solution for real-time UAV trajectory planning in 3D environments, combining distributed RL, consensus-based policy updates, and modular software integration. Empirical results demonstrate reliable obstacle avoidance, smooth motion profiles, and robust convergence, with RMSProp emerging as the preferred optimizer for $\mathrm{SE}(3)$ -aware planning. The framework's scalability and extensibility position it as a promising foundation for future research in autonomous multi-agent navigation and control.