Memory-based Deep Reinforcement Learning for Obstacle Avoidance in UAV with Limited Environment Knowledge (1811.03307v1)

Published 8 Nov 2018 in cs.RO and cs.LG

Abstract: This paper presents our method for enabling a UAV quadrotor, equipped with a monocular camera, to autonomously avoid collisions with obstacles in unstructured and unknown indoor environments. When compared to obstacle avoidance in ground vehicular robots, UAV navigation brings in additional challenges because the UAV motion is no more constrained to a well-defined indoor ground or street environment. Horizontal structures in indoor and outdoor environments like decorative items, furnishings, ceiling fans, sign-boards, tree branches etc., also become relevant obstacles unlike those for ground vehicular robots. Thus, methods of obstacle avoidance developed for ground robots are clearly inadequate for UAV navigation. Current control methods using monocular images for UAV obstacle avoidance are heavily dependent on environment information. These controllers do not fully retain and utilize the extensively available information about the ambient environment for decision making. We propose a deep reinforcement learning based method for UAV obstacle avoidance (OA) and autonomous exploration which is capable of doing exactly the same. The crucial idea in our method is the concept of partial observability and how UAVs can retain relevant information about the environment structure to make better future navigation decisions. Our OA technique uses recurrent neural networks with temporal attention and provides better results compared to prior works in terms of distance covered during navigation without collisions. In addition, our technique has a high inference rate (a key factor in robotic applications) and is energy-efficient as it minimizes oscillatory motion of UAV and reduces power wastage.

PDF Abstract

Memory-based Deep Reinforcement Learning for Obstacle Avoidance in UAV with Limited Environment Knowledge

The paper entitled "Memory-based Deep Reinforcement Learning for Obstacle Avoidance in UAV with Limited Environment Knowledge" addresses the development of a UAV navigation system using deep reinforcement learning (DRL) that operates effectively in unstructured and unknown indoor environments. The research primarily focuses on the challenge of enabling UAVs to autonomously avoid obstacles, a function that is considerably more complex compared to similar tasks in ground vehicular robots due to the UAV's multidimensional motion possibilities and potential for dynamic and unpredictable obstacles.

Methodological Contributions

The paper introduces a DRL-based methodology leveraging recurrent neural networks (RNNs) with a novel temporal attention mechanism. This architecture is designed to accommodate the unique constraints of UAVs. The UAV's navigational capabilities are enhanced through a Partial Observable Markov Decision Process (POMDP) framework, which integrates visual inputs from a monocular camera to predict navigable spaces and potential collisions.

Key Components of the System

Depth Map Prediction: The research utilizes a Conditional Generative Adversarial Network (cGAN) to predict depth maps from the RGB images captured by the UAV's monocular camera. This innovative application of a cGAN for image-to-image translation demonstrates adaptability and efficiency, providing robust depth predictions necessary for real-time obstacle avoidance.
Deep Recurrent Q-Network with Temporal Attention: The UAV controller employs a Deep Q-Network (DQN) augmented with RNNs and temporal attention. This component retains crucial temporal information and aggregates historical data to make informed navigation decisions, improving the UAV's ability to avoid complex obstacle configurations and dynamic entities.
Reinforcement Learning Framework: A POMDP model is used to define the state and action spaces of the environment, with the UAV learning optimal policies through interactions within simulated environments. The reward structure is strategically designed to encourage energy-efficient and collision-free navigation.

Experimental Evaluation

The paper conducts extensive experiments using simulated environments, demonstrating the UAV's ability to navigate with significantly higher robustness compared to baseline models such as standard DQNs. The results convey substantial improvements in terms of distance covered between collisions and energy efficiency, as evidenced by a reduction in unnecessary UAV motions, referred to as "wobbling." The UAV successfully navigates through environments with both static and dynamic obstacles, including scenarios with moving human actors.

Insights and Implications

The findings of this research reinforce the importance of memory and temporal information in robotics where decision-making is clouded by partial observability. The introduction of temporal attention within an RNN framework can significantly enhance UAV navigation performance under constrained sensory input. Furthermore, the method exhibits promising implications for translational use in real-world applications, bolstered by the deployment of noise-augmented training regimes and rigorous model evaluation on diverse environmental setups.

Future Directions

To enhance applicability and robustness, future research could focus on improving depth prediction accuracy and extending the approach to outdoor scenarios. In addition, exploring alternative network architectures or integrating scene prediction may further refine the system's navigational aptitude. Incorporating strategies for regret minimization in learning policies may yield more resilient UAV navigation under diverse and unforeseen conditions.

The research presented herein marks a significant contribution to UAV autonomy, providing a framework that highlights the synergy between cutting-edge DRL techniques and practical UAV applications, emphasizing the potential for improved UAV functionality amidst limited environmental knowledge.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

Abhik Singla (6 papers)
Sindhu Padakandla (5 papers)
Shalabh Bhatnagar (86 papers)

Citations (179)

View on Semantic Scholar

Memory-based Deep Reinforcement Learning for Obstacle Avoidance in UAV with Limited Environment Knowledge (1811.03307v1)