Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI (2407.06886v7)

Published 9 Jul 2024 in cs.CV, cs.AI, cs.LG, cs.MA, and cs.RO

Abstract: Embodied Artificial Intelligence (Embodied AI) is crucial for achieving AGI and serves as a foundation for various applications that bridge cyberspace and the physical world. Recently, the emergence of Multi-modal Large Models (MLMs) and World Models (WMs) have attracted significant attention due to their remarkable perception, interaction, and reasoning capabilities, making them a promising architecture for the brain of embodied agents. However, there is no comprehensive survey for Embodied AI in the era of MLMs. In this survey, we give a comprehensive exploration of the latest advancements in Embodied AI. Our analysis firstly navigates through the forefront of representative works of embodied robots and simulators, to fully understand the research focuses and their limitations. Then, we analyze four main research targets: 1) embodied perception, 2) embodied interaction, 3) embodied agent, and 4) sim-to-real adaptation, covering the state-of-the-art methods, essential paradigms, and comprehensive datasets. Additionally, we explore the complexities of MLMs in virtual and real embodied agents, highlighting their significance in facilitating interactions in dynamic digital and physical environments. Finally, we summarize the challenges and limitations of embodied AI and discuss their potential future directions. We hope this survey will serve as a foundational reference for the research community and inspire continued innovation. The associated project can be found at https://github.com/HCPLab-SYSU/Embodied_AI_Paper_List.

PDF HTML Abstract

Aligning Cyber Space with the Physical World: A Comprehensive Survey on Embodied AI

The paper presents a comprehensive survey on Embodied AI, emphasizing the integration of cyber space with the physical world. Embodied AI is positioned as a critical foundation for achieving AGI by facilitating meaningful interactions between virtual environments and physical entities. This survey addresses key advancements in Embodied AI, particularly in the context of Multi-modal Large Models (MLMs) and World Models (WMs), which have emerged as potent architectures for embodied agents.

Overview of Embodied AI Components

The survey systematically explores four main research targets: embodied perception, embodied interaction, embodied agents, and sim-to-real adaptation. It discusses the significance of MLMs in enhancing the perception and decision-making capabilities of embodied agents, allowing them to interact effectively in dynamic digital and physical environments.

Numerical Insights and Representative Models

The paper provides insights into the growing interest in Embodied AI, citing approximately 10,700 publications in 2023. It highlights the potential of representative models like RT-2 and RT-H, which underscore the integration of MLMs in developing general-purpose embodied agents. These models, however, face limitations such as restricted long-term memory and complex task decomposition.

Embodied AI Tasks and Challenges

The survey explores the intricacies of embodied perception, focusing on visual active perception and 3D scene understanding, and how these facilitate agents’ interactions with their environments. It discusses embodied interaction through tasks like Embodied Question Answering (EQA) and embodied grasping, emphasizing the need for a unified framework that incorporates human-like reasoning and understanding.

Sim-to-Real Adaptation

A critical component of the survey is the examination of sim-to-real adaptation, where the authors explore the use of embodied world models and advanced simulation techniques to transfer capabilities from virtual to real-world scenarios. They outline the practical challenges and potential solutions in achieving fidelity between simulated and physical environments.

Future Directions and Theoretical Implications

The survey concludes by identifying future research directions, focusing on high-quality dataset acquisition, effective utilization of human demonstration data, and the development of robust embodied world models. There is a call for unified benchmarks to evaluate the holistic capabilities of embodied AI systems, considering both high-level task planning and low-level control mechanisms.

In summary, this paper provides a thorough exploration of the current state and future prospects of Embodied AI. It emphasizes the critical role of integrating diverse sensory modalities and advanced modeling techniques to enhance the interaction between cyber spaces and the physical world, driving towards the broader goal of achieving AGI. Future developments will likely focus on enhancing generalization capabilities to extend the applicability of embodied agents across various domains.