ObjectNav Revisited: On Evaluation of Embodied Agents Navigating to Objects (2006.13171v2)

Published 23 Jun 2020 in cs.CV and cs.RO

Abstract: We revisit the problem of Object-Goal Navigation (ObjectNav). In its simplest form, ObjectNav is defined as the task of navigating to an object, specified by its label, in an unexplored environment. In particular, the agent is initialized at a random location and pose in an environment and asked to find an instance of an object category, e.g., find a chair, by navigating to it. As the community begins to show increased interest in semantic goal specification for navigation tasks, a number of different often-inconsistent interpretations of this task are emerging. This document summarizes the consensus recommendations of this working group on ObjectNav. In particular, we make recommendations on subtle but important details of evaluation criteria (for measuring success when navigating towards a target object), the agent's embodiment parameters, and the characteristics of the environments within which the task is carried out. Finally, we provide a detailed description of the instantiation of these recommendations in challenges organized at the Embodied AI workshop at CVPR 2020 http://embodied-ai.org .

Authors (8)

Dhruv Batra (160 papers)
Aaron Gokaslan (33 papers)
Aniruddha Kembhavi (79 papers)
Oleksandr Maksymets (17 papers)
Roozbeh Mottaghi (66 papers)
Manolis Savva (64 papers)
Alexander Toshev (48 papers)
Erik Wijmans (25 papers)

Citations (220)

View on Semantic Scholar

Summary

Overview of "ObjectNav Revisited: On Evaluation of Embodied Agents Navigating to Objects"

This paper centers on revisiting the Object-Goal Navigation (ObjectNav) task, crucial for the burgeoning field of embodied AI. ObjectNav involves navigating to an object specified by its label within unexplored environments. This task is pivotal for developing robots capable of performing complex tasks in dynamic and unknown settings.

The authors observe that as interest in semantic navigation grows, inconsistent interpretations of ObjectNav have arisen. This paper aims to provide standardized recommendations on the evaluation protocols, agent embodiment parameters, and environment characteristics for the ObjectNav task, ensuring clarity and consistency across the research community.

Key Contributions

Evaluation Protocols: The paper outlines precise success criteria for ObjectNav episodes, focusing on an agent's ability to reach a target object efficiently. The recommended evaluation metrics include Success weighted by Path Length (SPL), reflecting both navigation success and path efficiency. The authors recognize certain inadequacies in SPL, such as its insensitivity to minor errors and high variance, and suggest that future metrics should address these shortcomings.
Agent Embodiment: A balance between realistic control actions and manageable complexity is advocated. The authors recommend discrete actions emulating differential drive mobility and emphasize realistic sensing through RGB-D cameras and localization technologies (e.g., GPS+Compass).
Environment Specifications: The use of 3D scanned environments with authentic layouts and high visual fidelity is recommended. This approach ensures that the environments reflect real-world scenarios, which is crucial for sim-to-real transfer learning. Several datasets like Matterport3D and Gibson are highlighted as suitable examples.

Challenges and Recommendations

The authors stress the need to define ObjectNav tasks with specificity, considering factors such as the nature of success criteria, agent form, and action capabilities. They discuss the implications of various choices, such as the impact of collision dynamics on policy development. The collaborators propose the elimination of sliding dynamics during collisions, reducing the risk of learned policies exploiting these dynamics unrealistically.

A significant portion of the paper is dedicated to detailing task definitions and outlining challenge structures, exemplified by breakouts on platforms like Habitat and RoboTHOR. Each platform's setup, which includes selection of environments, action space, success criteria, and sensing capabilities, is meticulously described to promote consistency and replicability of research.

Implications and Future Directions

By establishing a common framework for ObjectNav tasks, this paper significantly contributes to fostering systematic evaluations and meaningful comparisons across embodied AI research. It encourages the community to develop navigation agents capable of generalizing across different settings and object categories.

The exploration into new evaluation metrics beyond SPL is anticipated to yield more nuanced task assessments. Moreover, the paper's recommendations provide a robust foundation for addressing key gaps between simulated environments and real-world execution, paving the way for enhanced transferability and practical deployments.

Future developments in AI will likely benefit from these standardized benchmarks, as they facilitate the creation and testing of algorithms in environments that more closely mimic real-world complexities. As research progresses, further refinements to these benchmarks may arise, driven by advancements in simulation fidelity and robotic capabilities. Such evolutions will inevitably lead to more capable and autonomous embodied agents, meeting practical demands and advancing theoretical understanding within the field.

PDF Markdown