Decentralized Motion Planning for Multi-Robot Navigation using Deep Reinforcement Learning (2011.05605v2)

Published 11 Nov 2020 in cs.RO, cs.AI, cs.LG, cs.MA, and cs.NE

Abstract: This work presents a decentralized motion planning framework for addressing the task of multi-robot navigation using deep reinforcement learning. A custom simulator was developed in order to experimentally investigate the navigation problem of 4 cooperative non-holonomic robots sharing limited state information with each other in 3 different settings. The notion of decentralized motion planning with common and shared policy learning was adopted, which allowed robust training and testing of this approach in a stochastic environment since the agents were mutually independent and exhibited asynchronous motion behavior. The task was further aggravated by providing the agents with a sparse observation space and requiring them to generate continuous action commands so as to efficiently, yet safely navigate to their respective goal locations, while avoiding collisions with other dynamic peers and static obstacles at all times. The experimental results are reported in terms of quantitative measures and qualitative remarks for both training and deployment phases.

PDF Abstract

Analyzing Decentralized Motion Planning for Multi-Robot Navigation Using Deep Reinforcement Learning

The presented paper addresses a sophisticated problem in robotics and reinforcement learning: decentralized motion planning for multi-robot navigation using deep reinforcement learning (DRL). The primary goal is to design a system where multiple robots, modeled as non-holonomic differential-drive robots, can navigate towards their individual goals while avoiding collisions with each other and static obstacles.

Key Contributions and Methodology

The paper's core contribution lies in introducing a decentralized framework leveraging DRL to solve the motion planning problem for a set of cooperative robots. The methodology is grounded in adopting a partially observable Markov decision process (POMDP) to model the agents' decision-making scenario. This approach allows each robot to utilize a policy informed by local state observations and sparse communication among peers, eliminating the need for a centralized coordinator that could introduce bottlenecks in dynamic environments.

The work employs a custom-built simulator, MARL Simulator, based on Unity’s framework and ML-Agents Toolkit, designed to evaluate multi-agent reinforcement learning behaviors effectively. This choice enriches the flexibility of environment settings and scenarios, accommodating essential adjustments pertinent to research on autonomous systems. The simulator facilitates the training of DRL models in a square arena setting with four agents that are tasked with navigating safely to predetermined goal positions.

Reward Structure and Optimization Objective

The reward function is crafted to guide the agents to their goals while minimizing collisions and optimizing both spatial trajectory and temporal goal achievement. Successful navigation rewards the agents, while collisions entail penalties, affecting the optimization process directed toward minimizing time-to-goal.

The optimization problem is articulated to maximize the expected future discounted reward, a standard practice in reinforcement learning, which guides the agents towards discovering policies that achieve the task optimally. This problem is further nuanced with kinematic and collision constraints to reflect realistic robotic motion planning contexts.

Experimental Design and Results

Three distinct experimental setups are designed to test the system: Go-to-Goal with Collision Avoidance (G2GCA), Antipodal Exchange (APE), and G2GCA with Random Initialization (G2GCARI). These setups vary in complexity and agent initialization, testing the robustness and adaptability of the learned policies under different conditions.

Results from the experiments, which include both quantitative metrics and qualitative analyses, suggest that the DRL-based method achieves a respectable degree of success in navigation tasks. Notably, the common policy approach is identified as more efficient than the individual policy approach due to its faster convergence and comparable performance regarding the success rate.

Implications and Future Work

The findings of this research have significant implications for fields such as autonomous vehicle coordination, warehouse automation, and search and rescue operations. By decentralizing the decision-making process and allowing for individualized learning in shared environments, this paper paves the way for more scalable and robust solutions in dynamic multi-agent systems.

Future research avenues include expanding the framework to incorporate non-cooperative agents, which presents additional challenges related to competition and potential conflicts of interest in shared spaces. Moreover, further exploration into more complex and larger-scale environments would be valuable to test the limits of the proposed DRL methodologies and refine them for practical deployment in real-world scenarios.

In conclusion, this paper contributes meaningfully to the disciplines of multi-robot systems and deep reinforcement learning by offering a decentralized approach to motion planning that is both computationally efficient and robust across varying degrees of environment complexity. The proposed framework and its findings serve as a foundational reference for future work aiming to optimize multi-robot systems in decentralized, dynamic, and partially observable environments.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Citations (11)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos