Hierarchical Representations and Explicit Memory: Learning Effective Navigation Policies on 3D Scene Graphs using Graph Neural Networks (2108.01176v2)

Published 2 Aug 2021 in cs.RO and cs.AI

Abstract: Representations are crucial for a robot to learn effective navigation policies. Recent work has shown that mid-level perceptual abstractions, such as depth estimates or 2D semantic segmentation, lead to more effective policies when provided as observations in place of raw sensor data (e.g., RGB images). However, such policies must still learn latent three-dimensional scene properties from mid-level abstractions. In contrast, high-level, hierarchical representations such as 3D scene graphs explicitly provide a scene's geometry, topology, and semantics, making them compelling representations for navigation. In this work, we present a reinforcement learning framework that leverages high-level hierarchical representations to learn navigation policies. Towards this goal, we propose a graph neural network architecture and show how to embed a 3D scene graph into an agent-centric feature space, which enables the robot to learn policies for low-level action in an end-to-end manner. For each node in the scene graph, our method uses features that capture occupancy and semantic content, while explicitly retaining memory of the robot trajectory. We demonstrate the effectiveness of our method against commonly used visuomotor policies in a challenging object search task. These experiments and supporting ablation studies show that our method leads to more effective object search behaviors, exhibits improved long-term memory, and successfully leverages hierarchical information to guide its navigation objectives.

PDF Abstract

Learning Effective Navigation Policies on 3D Scene Graphs with Graph Neural Networks

The paper "Learning Effective Navigation Policies on 3D Scene Graphs using Graph Neural Networks" presents a notable reinforcement learning framework that leverages 3D Dynamic Scene Graphs (DSGs) for robotic navigation in complex environments. This framework makes critical use of high-level hierarchical representations to enhance navigation policy learning, using graph neural networks (GNNs) to map scene structure into actionable control spaces.

Overview

Traditional approaches to robotic navigation frequently rely on direct mappings between raw sensor inputs and actions, which tend to be inefficient in terms of sample usage and generalization across unfamiliar spaces. Situated within this context, the proposed framework utilizes DSGs, which encode environments at multiple levels of abstraction including geometry, topology, and semantics. DSGs provide comprehensive representations that include layers for objects, places, rooms, and more, which can be transformative for navigation tasks that require both spatial awareness and semantic understanding.

Methodology

Central to the framework is the concept of graph observations, where parts of the DSG are translated into node and edge structures that are accessible to the learning agent. The algorithm comprises three primary elements: the construction of graph observations using the DSG, message passing using graph neural networks to derive embedded feature vectors, and a policy network that informs robot actions from these features.

The graph observation incorporates an Action layer, further enhancing the robot's immediate decision-making capabilities by effectively linking high-level DSG nodes with potential action nodes, enabling navigation across free space. Nodes are enriched with features including node type, semantic labels, positional information, and indicators for space traversability.

Results

In experimental settings evaluating the task of multi-object search within a semantic-rich simulated indoor office environment, the proposed approach demonstrated superior performance relative to baseline methods relying on RGB images, RGB-D images with semantic segmentation, and ESDF slices. Quantitative evaluations indicated improved efficiency in terms of targets found and space explored, while emphasizing the benefit of hierarchical information and explicit memory as facilitated by DSGs.

Implications and Future Directions

The implications of this research are multifaceted. Practically, the enhanced ability to leverage DSGs for robotic navigation signals promising advancements in autonomous exploration and task execution within complex environments. Theoretically, it leads to further inquiry into the hierarchical representation paradigms within scene graphs and their utility in diverse applications beyond navigation.

Noteworthy future research avenues involve refining the Action layer for richer interaction capabilities, exploring deeper feature embeddings for nodes, and integrating real-time DSG construction methodologies like Hydra. Transitioning these capabilities to real-world robot deployments could significantly advance operational efficiencies and semantic understanding in autonomous navigation systems.

The framework outlined in this paper demonstrates the capacity for hierarchical scene understanding via dynamic graphs to not only facilitate navigation tasks but also enforce robust learning policies that can navigate large-scale multi-dimensional environments effectively.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Zachary Ravichandran (11 papers)
Lisa Peng (3 papers)
Nathan Hughes (13 papers)
J. Daniel Griffith (3 papers)
Luca Carlone (109 papers)

Citations (59)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos