Semi-parametric Topological Memory for Navigation (1803.00653v1)

Published 1 Mar 2018 in cs.LG, cs.AI, cs.CV, and cs.RO

Abstract: We introduce a new memory architecture for navigation in previously unseen environments, inspired by landmark-based navigation in animals. The proposed semi-parametric topological memory (SPTM) consists of a (non-parametric) graph with nodes corresponding to locations in the environment and a (parametric) deep network capable of retrieving nodes from the graph based on observations. The graph stores no metric information, only connectivity of locations corresponding to the nodes. We use SPTM as a planning module in a navigation system. Given only 5 minutes of footage of a previously unseen maze, an SPTM-based navigation agent can build a topological map of the environment and use it to confidently navigate towards goals. The average success rate of the SPTM agent in goal-directed navigation across test environments is higher than the best-performing baseline by a factor of three. A video of the agent is available at https://youtu.be/vRF7f4lhswo

Authors (3)

Nikolay Savinov (16 papers)
Alexey Dosovitskiy (49 papers)
Vladlen Koltun (114 papers)

Citations (362)

View on Semantic Scholar

Summary

Review of "Semi-parametric Topological Memory for Navigation"

The paper "Semi-parametric Topological Memory for Navigation" by Savinov, Dosovitskiy, and Koltun introduces a novel memory architecture for navigation in unfamiliar environments, tailored to bridge the gap between deep learning-based approaches and biologically inspired navigation strategies. The proposed semi-parametric topological memory (SPTM) model integrates a non-parametric graph representing connectivity in an environment with a parametric deep network for node retrieval based on observations. This paper highlights the capacity of the method to construct topological maps from limited exposure to a new environment, followed by robust goal-driven navigation.

Methodology Overview

The SPTM architecture is underpinned by two primary components: a non-parametric graph and a parametric deep retrieval network. Upon exploring an environment, the graph nodes represent specific locations, and edges denote connectivity between these locations, inferred through visual similarity rather than metric distances. Exploration observations populate this graph, with shortcut connections augmenting adjacency based on similar visual cues.

For practical navigation, the retrieval network, trained via self-supervision, aids in identifying graph nodes corresponding to given observations. This network, structured in a siamese manner, ensures efficient similarity estimation between observed states. Additionally, the locomotion network, tasked with short-range navigation, employs a feedforward architecture trained to map pairs of current and target observations to discrete action probabilities.

During navigation, the SPTM facilitates localization and path planning within the memory graph, outputting a proximal waypoint for direct maneuvering by the locomotion network. This tiered approach decomposes the navigation task, allowing for adaptive and efficient transitioning towards the goal location.

Experimental Evaluation and Results

Experiments are conducted in a 3D simulated environment where agents are evaluated on the task of navigating mazes after brief observational phases. The authors make a compelling case for the robustness of the SPTM approach, contrasting it against several baseline methods designed on recurrent and feedforward deep learning architectures.

Quantitatively, the SPTM agent demonstrates a remarkable threefold improvement in success rates over the highest-performing baseline. The navigation accuracy across test environments consistently hits a 100% success rate in trials capped at 5000 steps, underscoring the method's capacity to generalize across unseen environments as opposed to relying on memorized layouts from training.

Implications and Future Work

The implications of SPTM suggest a shift towards incorporating biologically inspired navigation tactics into autonomous systems, especially those deployed in dynamic or unfamiliar terrains. By eschewing the necessity for prior metric information, the framework aligns well with natural navigation methods observed in animals, leveraging visual landmarks and topological knowledge.

Future work could explore enhancements such as incorporating noisy ego-motion estimates for further robustness, adaptive memory subsampling for scalability in extensive environments, and the potential for end-to-end trainability of the entire framework. The potential integration of SPTM into broader AI systems, capable of learning lifelong and in more open-ended settings, also presents a promising avenue for research.

In conclusion, the paper provides a significant contribution to the field of navigation in artificial agents, blending insights from psychological studies and modern deep learning advancements to yield a sophisticated, generalizable navigation paradigm. This work lays foundational concepts that could influence future advancements in both practical applications and theoretical developments within the field of AI-driven navigation systems.

PDF Markdown

Related Papers

YouTube

Show All Videos