Decentralized Structural-RNN for Robot Crowd Navigation with Deep Reinforcement Learning (2011.04820v3)

Published 9 Nov 2020 in cs.RO, cs.AI, and cs.LG

Abstract: Safe and efficient navigation through human crowds is an essential capability for mobile robots. Previous work on robot crowd navigation assumes that the dynamics of all agents are known and well-defined. In addition, the performance of previous methods deteriorates in partially observable environments and environments with dense crowds. To tackle these problems, we propose decentralized structural-Recurrent Neural Network (DS-RNN), a novel network that reasons about spatial and temporal relationships for robot decision making in crowd navigation. We train our network with model-free deep reinforcement learning without any expert supervision. We demonstrate that our model outperforms previous methods in challenging crowd navigation scenarios. We successfully transfer the policy learned in the simulator to a real-world TurtleBot 2i.

View on arXiv

Authors (5)

Shuijing Liu (18 papers)
Peixin Chang (7 papers)
Weihang Liang (6 papers)
Neeloy Chakraborty (15 papers)
Katherine Driggs-Campbell (77 papers)

Citations (93)

View on Semantic Scholar

Summary

Decentralized Structural-RNN for Robot Crowd Navigation with Deep Reinforcement Learning

The paper "Decentralized Structural-RNN for Robot Crowd Navigation with Deep Reinforcement Learning" proposes a novel approach to address the challenges of safe and efficient navigation for mobile robots in dense and partially observable environments. This challenging scenario is characterized by decentralized dynamics where each agent operates under its own policy, often with unknown intentions and walking patterns. Prior methodologies in this domain have struggled with scenarios involving complex human interactions, leading to limitations such as the freezing robot problem.

Method and Approach

The key contribution of this work is the introduction of a decentralized Structural-Recurrent Neural Network (DS-RNN), a novel network architecture designed to reason over spatio-temporal relationships for decision-making in robot navigation. Unlike previous models that have relied on pre-defined human dynamics and expert supervision, the DS-RNN framework leverages model-free deep reinforcement learning (RL) to autonomously derive navigation policies.

The DS-RNN is constructed upon the spatio-temporal graph (st-graph) paradigm, which efficiently captures the interactions between the robot and surrounding agents over time. The st-graph, composed of nodes and edges, represents agents (robot and humans) and their interactions. Nodes denote the agents themselves, while spatial and temporal edges capture relationships between agents at given timesteps and the temporal progression of the robot's state, respectively.

Network Architecture

The DS-RNN network integrates three RNN components: spatial edgeRNN, temporal edgeRNN, and nodeRNN. Each component is tailored to process specific types of features, allowing the network to decompose the complex navigation task into manageable factors. The spatial edgeRNN handles interactions between the robot and each human, while the temporal edgeRNN manages the robot's motion continuity. NodeRNN then assimilates these insights to produce action policies for navigation.

An attention mechanism is embedded within the architecture, enabling dynamic weighting of spatial information based on temporal robot dynamics. This multi-layer approach supports the robot's decision-making by focusing computational resources on critical interaction data, thus improving the robustness and adaptability to complex, unpredictable environments.

Simulation and Real-world Evaluation

The proposed DS-RNN model is compared against several benchmarks within controlled simulated environments showcasing varying complexities, such as different field of view constraints and population densities. Experimental results demonstrate superior performance of the DS-RNN in terms of success rates and average navigation times, particularly in environments where traditional methods display high rates of timeouts or collisions.

Empirical validation is extended to real-world conditions using a TurtleBot 2i platform, where the model navigates among pedestrians detected via a combination of YOLOv3-based human detection and Deep SORT for tracking. The successful transfer from simulation to real-world environments underscores the model's practical applicability.

Implications and Future Work

The DS-RNN’s ability to function without predefined human dynamics or expert supervision represents a significant stride within the field of autonomous crowd navigation. The flexibility and adaptability granted by deep RL and the thoughtful integration of spatio-temporal reasoning open avenues for further exploration in real-world settings with richer and more dynamic human interactions.

Anticipated future developments could explore the implications of mutual interaction recognition between robots and humans as well as advancements in sensor integration to reduce noise in human position detection, thereby enhancing the robustness and applicability of autonomous navigation systems in even more complex environments.

PDF Markdown

Related Papers

Find Related Papers

YouTube

Show All Videos