Synergistic Reinforcement and Imitation Learning for Vision-driven Autonomous Flight of UAV Along River

Published 17 Jan 2024 in cs.RO | (2401.09332v2)

Abstract: Vision-driven autonomous flight and obstacle avoidance of Unmanned Aerial Vehicles (UAVs) along complex riverine environments for tasks like rescue and surveillance requires a robust control policy, which is yet difficult to obtain due to the shortage of trainable riverine environment simulators. To easily verify the vision-based navigation controller performance for the river following task before real-world deployment, we developed a trainable photo-realistic dynamics-free riverine simulation environment using Unity. In this paper, we address the shortcomings that vanilla Reinforcement Learning (RL) algorithm encounters in learning a navigation policy within this partially observable, non-Markovian environment. We propose a synergistic approach that integrates RL and Imitation Learning (IL). Initially, an IL expert is trained on manually collected demonstrations, which then guides the RL policy training process. Concurrently, experiences generated by the RL agent are utilized to re-train the IL expert, enhancing its ability to generalize to unseen data. By leveraging the strengths of both RL and IL, this framework achieves a faster convergence rate and higher performance compared to pure RL, pure IL, and RL combined with static IL algorithms. The results validate the efficacy of the proposed method in terms of both task completion and efficiency. The code and trainable environments are available.

Abstract PDF HTML Upgrade to Chat

Authors (3)

References (28)

Summary

The paper introduces a novel hybrid reinforcement and imitation learning framework to enhance UAV autonomous flight along riverine environments.
It employs a photo-realistic Unity simulation for training via behavior cloning and PPO, addressing challenges in dynamic, partially observable settings.
Results demonstrate improved navigation efficiency and robustness, with open-source resources provided for further research and development.

Vision-driven Autonomous UAV Flight: Reinforcement and Imitation Learning Synthesis

The paper "Synergistic Reinforcement and Imitation Learning for Vision-driven Autonomous Flight of UAV Along River" (2401.09332) introduces a novel methodology for enhancing Unmanned Aerial Vehicle (UAV) autonomous navigation along riverine environments using a hybrid of Reinforcement Learning (RL) and Imitation Learning (IL). This approach leverages a trainable, photo-realistic simulation environment constructed with Unity to facilitate robust policy training for UAV navigation tasks under complex and partially observable scenarios.

Problem Statement and Approach

Achieving autonomous UAV flight over riverine landscapes for tasks such as search, rescue, and surveillance involves navigating variable and obscured environments. Traditional waypoint navigation fails due to dynamic changes in river geometry and obstacles like bridges and foliage.

Figure 1: System architecture. A human expert collects good trajectories before training and the transitions are represented by blue arrows. $a^H_t$ denotes the human expert's action. The transitions during training are represented by black arrows. $a^E_t$ denotes the IL expert's action.

The proposed solution involves developing an RL and IL blend that aligns the efficiency of real-time decision-making through exploration and human expert guidance, circumventing the limitations of purely relying on either methodology alone. Initially, the system is trained on expert demonstrations utilizing Behavior Cloning (BC) to form a foundational navigation policy. The RL component, facilitated by Proximal Policy Optimization (PPO), integrates experience-driven learning with continual refinement of the policy.

Simulation Environment and Methodological Innovations

The paper introduces a photo-realistic riverine simulation environment enabled by Unity, which supports flexible UAV navigation tasks in synthetic but realistic scenarios. Critical features of this environment include varied river widths, tributaries, and obstacles that represent realistic navigation challenges.

Figure 2: Comparison of an image captured in Wildcat Creek, Indiana, USA (left) and image from Unity river environment (right) that are alike in overall components layout and texture appearance.

To accommodate the partially observable and non-Markovian aspects of the environment, the system employs a cooperative strategy where the BC expert is iteratively retrained with new learning acquired through RL phases. This dual-layer feedback mechanism significantly expedites convergence and optimizes decision policies beyond static IL benchmarks.

Results and Analysis

The performance of the hybrid system, across various environments including a grid-based track following scenario, demonstrates superior task efficiency and completion metrics compared to each individual learning technique. Figure 3 and the associated quantitative evaluation substantiate the claim with evidence of improved rewards and navigation effectiveness.

Figure 3: Demonstrative diagram of valid activity space and acceptable yaw range of camera agent in the river following task. $h_{1} = 1m, h_{2} = 15m, \alpha = 150^{\circ}$ in our experiments.

This is further corroborated by analysis of trajectory comparisons where the POV + Dynamic BC approach facilitates optimal route planning with minimized backtracking and deviation.

Conclusion and Future Directions

The integration of RL and IL in a dynamically adaptable framework presents a scalable and efficient solution for UAV autonomous flight in complex terrains. Importantly, the study maintains an open-source orientation by sharing both the simulation environment and methodological code, fostering further exploration and adaptation in advanced UAV control systems.

Although effective, the framework can be further refined by adopting advanced trajectory filtering mechanisms that dynamically integrate the most beneficial exploratory results into the learning process. Moreover, improving observation enhancements to counter non-Markovian constraints remains a tangible direction for future work. Expanding the framework's application into real-world UAV scenarios using embedded simulation-to-reality transfer techniques represents another strategic endeavor to enhance UAV operational autonomy.

Markdown Report Issue