A Survey of Embodied AI: From Simulators to Research Tasks (2103.04918v8)

Published 8 Mar 2021 in cs.AI and cs.LG

Abstract: There has been an emerging paradigm shift from the era of "internet AI" to "embodied AI", where AI algorithms and agents no longer learn from datasets of images, videos or text curated primarily from the internet. Instead, they learn through interactions with their environments from an egocentric perception similar to humans. Consequently, there has been substantial growth in the demand for embodied AI simulators to support various embodied AI research tasks. This growing interest in embodied AI is beneficial to the greater pursuit of AGI, but there has not been a contemporary and comprehensive survey of this field. This paper aims to provide an encyclopedic survey for the field of embodied AI, from its simulators to its research. By evaluating nine current embodied AI simulators with our proposed seven features, this paper aims to understand the simulators in their provision for use in embodied AI research and their limitations. Lastly, this paper surveys the three main research tasks in embodied AI -- visual exploration, visual navigation and embodied question answering (QA), covering the state-of-the-art approaches, evaluation metrics and datasets. Finally, with the new insights revealed through surveying the field, the paper will provide suggestions for simulator-for-task selections and recommendations for the future directions of the field.

PDF Abstract

An Overview of "A Survey of Embodied AI: From Simulators to Research Tasks"

The paper "A Survey of Embodied AI: From Simulators to Research Tasks" provides a comprehensive exploration of the current state of embodied AI, delineating the transition from traditional internet-based AI to systems where artificial agents interact with environments for learning. This paradigm shift aligns closely with the pursuit of AGI by facilitating real-world experiential learning, much like human cognition. This document not only surveys existing simulators critical for conducting embodied AI research but also explores the main research tasks fostered by these simulators—visual exploration, visual navigation, and embodied question answering.

Embodied AI Simulators

The paper meticulously evaluates nine embodied AI simulators: DeepMind Lab, AI2-THOR, CHALET, VirtualHome, VRKitchen, Habitat-Sim, iGibson, SAPIEN, and ThreeDWorld. Their selection is recent, with development spanning the last four years, and their analysis is based on seven distinct features: Environment, Physics, Object Type, Object Property, Controller, Action, and Multi-Agent. Each feature is discussed with respect to their contributions to realism, scalability, and interactivity in simulating environments. These simulators serve diverse roles, from replicating physical interactions using advanced physics engines to depicting photorealistic scenarios ideal for training AI agents.

Realism, a primary dimension emphasized in this review, pertains to both environmental fidelity and physics modeling—factors essential for transferring simulation-trained agents to real-world applications. Scalability relates to the ease with which simulators can incorporate extensive object and environment datasets. Notably, iGibson and Habitat-Sim are highlighted for their utilization in visual navigation and exploration tasks, attributed to their world-based scene construction that promotes high fidelity.

Embodied AI Research Tasks

Embodied AI research tasks scaffolded by these simulators are categorized into visual exploration, visual navigation, and embodied QA, presenting a natural progression of complexity akin to a pyramid structure. Visual exploration focuses on agents acquiring and interpreting 3D environmental models for further tasks, employing techniques like SLAM and curiosity-driven exploration. The exploration results are fundamental to visual navigation tasks, where the aim is to reach specified goals like navigating to an object or a point using policies informed by learned spatial maps or direct reinforcement learning strategies.

Visual navigation tasks include point navigation, object navigation, and enhanced tasks such as vision-and-language navigation (VLN) and interactive question answering (IQA). These tasks demand a blend of semantic understanding, interaction capabilities, and reasoning to tackle challenges like navigation with prior data or following natural language instructions. Such integration illustrates a step towards more complex autonomous systems capable of robust multi-modal interactions.

Embodied Question Answering

Embodied QA represents the apex of complexity, fusing sensory inputs, spatial reasoning, and linguistic comprehension for agents to answer questions within their environment contextually. Existing frameworks divide these into navigation and QA sub-tasks, emphasizing their symbiotic nature. The paper evaluates challenges in embodied QA, such as multi-target questions, requiring complex task execution like object comparison—a testament to the field's rapidly evolving landscape.

Conclusions and Future Directions

The survey underscores the significance of the identified simulators and tasks in advancing embodied AI research, accentuating both the opportunities and challenges present. It identifies the development of simulators with advanced physics features and richer interaction dynamics as critical for the next wave of research innovations. Future prospects such as Task-based Interactive Question Answering (TIQA) propose further integration of task execution with interactive QA, steering closer to genuine general intelligence.

In conclusion, this survey delivers a methodical and expansive understanding of embodied AI, spotlighting both the enabling tools and the intricate tasks they support. It aims to guide upcoming research by aligning simulator selection with task requirements, ultimately nurturing advancements towards more generalized AI systems. This well-curated compendium will serve as a vital reference point for researchers navigating this quickly developing domain of AI.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Jiafei Duan (26 papers)
Samson Yu (11 papers)
Hui Li Tan (7 papers)
Hongyuan Zhu (36 papers)
Cheston Tan (49 papers)

Citations (215)

View on Semantic Scholar