HAZARD Challenge: Embodied Decision Making in Dynamically Changing Environments (2401.12975v1)

Published 23 Jan 2024 in cs.CV, cs.AI, and cs.CL

Abstract: Recent advances in high-fidelity virtual environments serve as one of the major driving forces for building intelligent embodied agents to perceive, reason and interact with the physical world. Typically, these environments remain unchanged unless agents interact with them. However, in real-world scenarios, agents might also face dynamically changing environments characterized by unexpected events and need to rapidly take action accordingly. To remedy this gap, we propose a new simulated embodied benchmark, called HAZARD, specifically designed to assess the decision-making abilities of embodied agents in dynamic situations. HAZARD consists of three unexpected disaster scenarios, including fire, flood, and wind, and specifically supports the utilization of LLMs to assist common sense reasoning and decision-making. This benchmark enables us to evaluate autonomous agents' decision-making capabilities across various pipelines, including reinforcement learning (RL), rule-based, and search-based methods in dynamically changing environments. As a first step toward addressing this challenge using LLMs, we further develop an LLM-based agent and perform an in-depth analysis of its promise and challenge of solving these challenging tasks. HAZARD is available at https://vis-www.cs.umass.edu/hazard/.

View on arXiv

References (55)

Authors (9)

Qinhong Zhou (6 papers)
Sunli Chen (6 papers)
Yisong Wang (14 papers)
Haozhe Xu (3 papers)
Weihua Du (7 papers)
Hongxin Zhang (47 papers)
Yilun Du (113 papers)
Joshua B. Tenenbaum (257 papers)
Chuang Gan (195 papers)

Citations (8)

View on Semantic Scholar

Summary

Introduction

Embodied agents, or robots, are increasingly becoming a part of our daily lives, assisting with tasks ranging from mundane chores to complex rescue operations. To function effectively, these agents must not only navigate but also make real-time decisions in reaction to dynamic changes in their environment. The evaluation of these capabilities has been somewhat neglected, as existing simulation platforms prioritize agent-driven interactions over the spontaneous, unpredictable changes that characterize real-world settings. The paper introduced here addresses this gap by proposing the HAZARD challenge, which evaluates an agent's ability to make decisions in the presence of unexpected disasters such as fires, floods, and winds.

Dynamic Environments in Embodied AI

The HAZARD challenge builds upon the ThreeDWorld (TDW) platform, augmenting it with physical simulation and visual effects that can handle intricate environmental changes, from flames engulfing a room to floods submerging objects. Agents are tasked with rescuing valuable items by accurately perceiving the situation, reasoning about the effects of dynamic changes like rising temperatures or water levels, and planning a rescue strategy. A key innovation of this work is the integration of LLMs to assist with the tasks by providing a semantic understanding of observations, thereby enhancing decision-making.

Leveraging LLMs for Decision Making

To optimally leverage LLMs, the authors developed APIs that integrate visual and historical information into textual formats, reducing the frequency of queries needed to make decisions. This setup allowed for evaluating various decision-making pipelines, including LLM-based agents, against rule-based, search-based, and reinforcement learning approaches. Findings revealed that while LLMs could process basic environmental factors successfully, they faced challenges with more complex elements such as predicting dynamic changes or maintaining consistency between reasoning and prediction.

Experiments and Results

Experimental setups for the HAZARD challenge involved an array of indoor and outdoor scenarios, with a procedural generation pipeline ensuring varied and unpredictable environments. Agents were equipped with a refined action space for task execution, and their performance was measured across metrics such as rescue value rate, rescue step, and damage rate. LLM pipelines, using backbones like Llama-13b, GPT-3.5, and GPT-4, demonstrated notable decision-making skills even in zero-shot settings, particularly the GPT-4 exhibiting superior performance. However, when perception challenges were introduced, such as object obscurity due to environmental conditions, all methods suffered performance degradation, highlighting perception as a critical area for further research.

Conclusion

The HAZARD challenge represents a significant step forward in the domain of embodied AI, moving towards the assessment of agents' decision-making abilities in response to dynamic environmental changes. The work underlines the potential of incorporating LLMs into embodied AI tasks and opens up new avenues for future research, including action development to mitigate disaster effects and the integration of more complex decision-making capabilities. The findings from the HAZARD challenge contribute to our understanding of the intersection between embodied AI and disaster management, with implications for autonomous agents' real-world applicability in safety-critical applications.

PDF Markdown

Tweets

https://twitter.com/WilliamLamkin/status/1751631304372703642