Learning to Navigate in Cities Without a Map (1804.00168v3)

Published 31 Mar 2018 in cs.AI

Abstract: Navigating through unstructured environments is a basic capability of intelligent creatures, and thus is of fundamental interest in the study and development of artificial intelligence. Long-range navigation is a complex cognitive task that relies on developing an internal representation of space, grounded by recognisable landmarks and robust visual processing, that can simultaneously support continuous self-localisation ("I am here") and a representation of the goal ("I am going there"). Building upon recent research that applies deep reinforcement learning to maze navigation problems, we present an end-to-end deep reinforcement learning approach that can be applied on a city scale. Recognising that successful navigation relies on integration of general policies with locale-specific knowledge, we propose a dual pathway architecture that allows locale-specific features to be encapsulated, while still enabling transfer to multiple cities. We present an interactive navigation environment that uses Google StreetView for its photographic content and worldwide coverage, and demonstrate that our learning method allows agents to learn to navigate multiple cities and to traverse to target destinations that may be kilometres away. The project webpage http://streetlearn.cc contains a video summarising our research and showing the trained agent in diverse city environments and on the transfer task, the form to request the StreetLearn dataset and links to further resources. The StreetLearn environment code is available at https://github.com/deepmind/streetlearn

PDF Abstract

Analysis of "Learning to Navigate in Cities Without a Map"

The paper "Learning to Navigate in Cities Without a Map" by Mirowski et al. addresses a considerable challenge within artificial intelligence and robotics: enabling an agent to navigate through complex urban environments without access to explicit maps. Traditionally, navigation relies on building and using explicit map representations, but here, the authors have chosen to explore navigation training in a manner akin to how humans can learn to navigate unfamiliar cities without maps, solely relying on visual cues and recognizable landmarks.

Methodology

Mirowski et al. adopt the framework of reinforcement learning (RL), specifically leveraging deep reinforcement learning (DRL) techniques to tackle this problem on a city-wide scale. The authors introduce an innovative dual pathway architecture, designed to balance general navigational policies with locale-specific knowledge. This architecture allows for the transfer of learned navigation strategies across different urban regions, which could significantly enhance the adaptability and scalability of navigation systems.

A notable technical innovation is the use of Google Street View to construct the StreetLearn environment. This setup provides a realistic, high-dimensional visual simulation for training agents while covering vast geographic areas such as New York, London, and Paris. Within this environment, the agents are tasked to perform a 'courier task,' where they navigate between distant goals without previously seen landmarks or maps.

To train their models, the authors utilize IMPALA, a scalable RL algorithm known for its efficiency in multitask learning. The particular interest lies in the dual LSTM architecture, wherein one LSTM pathway handles general navigational policies, while another accommodates locale-specific learning that can be quickly retrained to adapt to new cities—demonstrating a modular approach to RL architecture design.

Results and Analysis

The experimental results demonstrate successful navigation capabilities across large, urban environments. The agents not only achieve significant rewards but also showcase the ability to generalize their navigation policy effectively to new unseen goal locations within trained environments. An intriguing aspect of this evaluation is the transfer learning experiments. The results are indicative of significant retention of navigational capabilities when the system is transitioned to new, unseen city sections, albeit with a necessary additional path-specific fine-tuning.

In terms of the rewards and shaping strategies explored, the paper provides insightful analysis about balancing the reward structure to effectively encourage goal-directed behavior while maintaining the practicality of long-range navigation. The use of a curriculum learning strategy further optimizes the training process, evolving from simpler navigational tasks to more complex whole-city traversals, ensuring stable learning without the risks associated with deep RL's data inefficiency.

Potential Implications and Future Directions

This work has several fascinating implications for the future of AI-driven navigation systems. Practically, such systems could revolutionize automated delivery services or autonomous vehicle navigation in urban centers, where traditional GPS-based navigation can be unreliable or inefficient. Theoretically, this paper poses interesting questions about the role of visual and cognitive cues in developing robust navigation algorithms that eschew typical geographic representations.

Future work could investigate expanding this modular and scalable framework to other domains, such as indoor navigation or other complex environments where map representations are challenging to obtain and maintain. Further research might also focus on the intersection of such navigation systems with other sensory modalities to augment their robustness against dynamic changes in the environment. Additionally, as RL algorithms continue to mature, improvements in data efficiency and generalization are likely to unlock even broader applications for this approach.

Overall, Mirowski et al.'s paper contributes a substantial step towards understanding and implementing AI systems capable of navigating complex real-world environments with minimal prior information, propelling both theoretical navigation understanding and practical AI applications forward.

PDF Markdown Bookmark Chat (Pro)

Authors (10)

Piotr Mirowski (20 papers)
Matthew Koichi Grimes (3 papers)
Mateusz Malinowski (41 papers)
Karl Moritz Hermann (22 papers)
Keith Anderson (9 papers)
Denis Teplyashin (10 papers)
Karen Simonyan (54 papers)
Koray Kavukcuoglu (57 papers)
Andrew Zisserman (248 papers)
Raia Hadsell (50 papers)

Citations (307)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - google-deepmind/streetlearn: A C++/Python implementation of the StreetLearn environment based on images from Street View, as well as a TensorFlow implementation of goal-driven navigation agents solving the task published in “Learning to Navigate in Cities Without a Map”, NeurIPS 2018 (293 stars)