HAZARD Challenge: Embodied Decision Making in Dynamically Changing Environments (2401.12975v1)
Abstract: Recent advances in high-fidelity virtual environments serve as one of the major driving forces for building intelligent embodied agents to perceive, reason and interact with the physical world. Typically, these environments remain unchanged unless agents interact with them. However, in real-world scenarios, agents might also face dynamically changing environments characterized by unexpected events and need to rapidly take action accordingly. To remedy this gap, we propose a new simulated embodied benchmark, called HAZARD, specifically designed to assess the decision-making abilities of embodied agents in dynamic situations. HAZARD consists of three unexpected disaster scenarios, including fire, flood, and wind, and specifically supports the utilization of LLMs to assist common sense reasoning and decision-making. This benchmark enables us to evaluate autonomous agents' decision-making capabilities across various pipelines, including reinforcement learning (RL), rule-based, and search-based methods in dynamically changing environments. As a first step toward addressing this challenge using LLMs, we further develop an LLM-based agent and perform an in-depth analysis of its promise and challenge of solving these challenging tasks. HAZARD is available at https://vis-www.cs.umass.edu/hazard/.
- Robocup rescue simulation. https://rescuesim.robocup.org/, 2023.
- Do as i can, not as i say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691, 2022.
- Deepmind lab. arXiv preprint arXiv:1612.03801, 2016.
- Informative path planning for an autonomous underwater vehicle. In 2010 IEEE International Conference on Robotics and Automation, pp. 4791–4796. IEEE, 2010.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- MMDetection: Open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155, 2019.
- Survey and requirements for search and rescue ground and air vehicles for mining applications. In 2012 19th International Conference on Mechatronics and Machine Vision in Practice (M2VIP), pp. 105–109. IEEE, 2012.
- Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
- Search and rescue robotics-from theory to practice, 2017.
- Embodied question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–10, 2018.
- Palm-e: An embodied multimodal language model. In arXiv preprint arXiv:2303.03378, 2023.
- Threedworld: A platform for interactive multi-modal physical simulation. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1).
- Survey on unmanned aerial vehicle networks for civil applications: A communications viewpoint. IEEE Communications Surveys & Tutorials, 18(4):2624–2661, 2016.
- Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, pp. 2961–2969, 2017.
- Inner monologue: Embodied reasoning through planning with language models. arXiv preprint arXiv:2207.05608, 2022.
- Housekeep: Tidying virtual households using commonsense reasoning. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXIX, pp. 355–373. Springer, 2022.
- Segment anything. arXiv:2304.02643, 2023.
- Bandit based monte-carlo planning. In Machine Learning: ECML 2006: 17th European Conference on Machine Learning Berlin, Germany, September 18-22, 2006 Proceedings 17, pp. 282–293. Springer, 2006.
- Ai2-thor: An interactive 3d environment for visual ai. arXiv preprint arXiv:1712.05474, 2017.
- Spot the difference: A novel task for embodied agents in changing environments. In 2022 26th International Conference on Pattern Recognition (ICPR), pp. 4182–4188. IEEE, 2022.
- igibson 2.0: Object-centric simulation for robot learning of everyday household tasks. In 5th Annual Conference on Robot Learning.
- Path planning technologies for autonomous underwater vehicles-a review. Ieee Access, 7:9745–9768, 2018.
- Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems, 35:31199–31212, 2022.
- Code as policies: Language model programs for embodied control. arXiv preprint arXiv:2209.07753, 2022.
- Text2motion: From natural language instructions to feasible plans. arXiv preprint arXiv:2303.12153, 2023.
- Llm+ p: Empowering large language models with optimal planning proficiency. arXiv preprint arXiv:2304.11477, 2023a.
- Grounding dino: Marrying dino with grounded pre-training for open-set object detection. arXiv preprint arXiv:2303.05499, 2023b.
- Multi criteria decision analysis (mcda) of unmanned aerial vehicles (uavs) as a part of standard response to emergencies. In 4th International Conference on Green Computing and Engineering Technologies, pp. 31. Gyancity International Publishers, 2018.
- Cooperative fire detection using unmanned aerial vehicles. In Proceedings of the 2005 IEEE international conference on robotics and automation, pp. 1884–1889. IEEE, 2005.
- Multiple eyes in the skies: architecture and perception issues in the comets unmanned air vehicles project. IEEE robotics & automation magazine, 12(2):46–57, 2005.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
- Virtualhome: Simulating household activities via programs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8494–8502, 2018.
- Path planning for ground robots in agriculture: A short review. In 2020 IEEE International Conference on Autonomous Robot Systems and Competitions (ICARSC), pp. 61–66. IEEE, 2020.
- Minos: Multimodal indoor simulator for navigation in complex environments. arXiv preprint arXiv:1712.03931, 2017.
- Habitat: A platform for embodied ai research. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 9339–9347, 2019.
- Proximal policy optimization algorithms. CoRR, abs/1707.06347, 2017. URL http://arxiv.org/abs/1707.06347.
- Lm-nav: Robotic navigation with large pre-trained models of language, vision, and action. In Conference on Robot Learning, pp. 492–504. PMLR, 2023.
- Unmanned aerial vehicles (uavs): A survey on civil applications and key research challenges. Ieee Access, 7:48572–48634, 2019.
- igibson 1.0: A simulation environment for interactive tasks in large realistic scenes. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 7520–7527. IEEE, 2021.
- Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:2212.04088, 2022.
- Habitat 2.0: Training home assistants to rearrange their habitat. Advances in Neural Information Processing Systems, 34:251–266, 2021.
- Using auction-based task allocation scheme for simulation optimization of search and rescue in disaster relief. Simulation Modelling Practice and Theory, 82:132–146, 2018.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023a.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023b.
- Chatgpt for robotics: Design principles and model abilities. Microsoft Auton. Syst. Robot. Res, 2:20, 2023.
- Chatgpt empowered long-step robot control in various environments: A case application. arXiv preprint arXiv:2304.03893, 2023.
- Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents, 2023.
- Sapien: A simulated part-based interactive environment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11097–11107, 2020a.
- Sapien: A simulated part-based interactive environment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11097–11107, 2020b.
- Chalet: Cornell house agent learning environment. arXiv preprint arXiv:1801.07357, 2018.
- Foundation models for decision making: Problems, methods, and opportunities. arXiv preprint arXiv:2303.04129, 2023.
- A review on marine search and rescue operations using unmanned aerial vehicles. International Journal of Marine and Environmental Sciences, 9(2):396–399, 2015.
- Building generalizable agents with a realistic and rich 3d environment, 2018. URL https://openreview.net/forum?id=rkaT3zWCZ.
- Moving forward by moving backward: Embedding action impact over action semantics. In The Eleventh International Conference on Learning Representations, 2022.
- A survey on path planning for persistent autonomy of autonomous underwater vehicles. Ocean Engineering, 110:303–313, 2015.
- Qinhong Zhou (6 papers)
- Sunli Chen (6 papers)
- Yisong Wang (14 papers)
- Haozhe Xu (3 papers)
- Weihua Du (7 papers)
- Hongxin Zhang (47 papers)
- Yilun Du (113 papers)
- Joshua B. Tenenbaum (257 papers)
- Chuang Gan (195 papers)