MineLand: Simulating Large-Scale Multi-Agent Interactions with Limited Multimodal Senses and Physical Needs (2403.19267v2)
Abstract: While Vision-LLMs (VLMs) hold promise for tasks requiring extensive collaboration, traditional multi-agent simulators have facilitated rich explorations of an interactive artificial society that reflects collective behavior. However, these existing simulators face significant limitations. Firstly, they struggle with handling large numbers of agents due to high resource demands. Secondly, they often assume agents possess perfect information and limitless capabilities, hindering the ecological validity of simulated social interactions. To bridge this gap, we propose a multi-agent Minecraft simulator, MineLand, that bridges this gap by introducing three key features: large-scale scalability, limited multimodal senses, and physical needs. Our simulator supports 64 or more agents. Agents have limited visual, auditory, and environmental awareness, forcing them to actively communicate and collaborate to fulfill physical needs like food and resources. Additionally, we further introduce an AI agent framework, Alex, inspired by multitasking theory, enabling agents to handle intricate coordination and scheduling. Our experiments demonstrate that the simulator, the corresponding benchmark, and the AI agent framework contribute to more ecological and nuanced collective behavior.The source code of MineLand and Alex is openly available at https://github.com/cocacola-lab/MineLand.
- Alderfer, C. P. An empirical test of a new theory of human needs. Organizational behavior and human performance, 4(2):142–175, 1969.
- Is there any social principle for llm-based agents? arXiv preprint arXiv:2308.11136, 2023.
- Bates, J. The role of emotion in believable agents. Communications of the ACM, 37(7):122–125, 1994. doi: 10.1145/176789.176803.
- Using cognitive psychology to understand gpt-3. Proceedings of the National Academy of Sciences, 120(6):e2218523120, 2023.
- Bledsoe, W. I had a dream: Aaai presidential address. AI Magazine, 7(1):57–61, 1986.
- Open-world multi-task control through goal-aware representation learning and adaptive horizon prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13734–13744, June 2023.
- An overview of recent progress in the study of distributed multi-agent coordination. IEEE Transactions on Industrial informatics, 9(1):427–438, 2012.
- The psychology of human-computer interaction. 1983.
- da Rocha Costa, A. C. A Variational Basis for the Regulation and Structuration Mechanisms of Agent Societies. Springer, 2019.
- A game ai approach to autonomous control of virtual characters. In Proceedings of the Interservice/Industry Training, Simulation, and Education Conference (I/ITSEC’11), Orlando, FL, USA, 2011.
- A theory of human needs. Critical Social Policy, 4(10):6–38, 1984.
- Minedojo: Building open-ended embodied agents with internet-scale knowledge. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022. URL https://openreview.net/forum?id=rc8o_j8I8PX.
- Mindagent: Emergent gaming interaction. arXiv preprint arXiv:2309.09971, 2023.
- Minerl: A large-scale dataset of minecraft demonstrations.
- Heil, J. Perception and cognition. 1983.
- Steamer: An interactive inspectable simulation-based training system. AI Magazine, 5(2):23–36, 1984.
- Horton, J. J. Large language models as simulated economic agents: What can we learn from homo silicus?, 2023.
- Evaluating and inducing personality in pre-trained language models. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=I9xE1Jsjfx.
- The goms family of user interface analysis techniques: Comparison and contrast. ACM Transactions on Computer-Human Interaction (TOCHI), 3(4):320–351, 1996.
- The malmo platform for artificial intelligence experimentation. In Proc. 25th International Joint Conference on Artificial Intelligence, pp. 4246, Palo Alto, California USA, 2016. AAAI Press. URL https://github.com/Microsoft/malmo.
- Automated intelligent pilots for combat flight simulation. AI Magazine, 20(1):27–42, 1999.
- Vocal expression of affect. The new handbook of methods in nonverbal behavior research, pp. 65–135, 2005.
- Human-level ai’s killer application: Interactive computer games. AI Magazine, 22(2):15, 2001. doi: 10.1609/aimag.v22i2.1558.
- Social simulacra: Creating populated prototypes for social computing systems. In In the 35th Annual ACM Symposium on User Interface Software and Technology (UIST ’22), UIST ’22, New York, NY, USA, 2022. Association for Computing Machinery. ISBN 9781450393201. doi: 10.1145/3526113.3545616. URL https://doi.org/10.1145/3526113.3545616.
- Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, pp. 1–22, 2023.
- The multi-agent reinforcement learning in malmö (marlö) competition, 2019.
- PrismarineJS. mineflayer. https://github.com/PrismarineJS/mineflayer, 2023.
- Watch-and-help: A challenge for social perception and human-ai collaboration. In International Conference on Learning Representations, 2020.
- Communicative agents for software development. arXiv preprint arXiv:2307.07924, 2023.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pp. 8748–8763. PMLR, 2021.
- Riedl, M. O. Interactive narrative: A novel application of artificial intelligence for computer games. In Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence (AAAI’12), pp. 2160–2165, 2012.
- Orb: An efficient alternative to sift or surf. In 2011 International Conference on Computer Vision, pp. 2564–2571, 2011. doi: 10.1109/ICCV.2011.6126544.
- Threaded cognition: an integrated theory of concurrent multitasking. Psychological review, 115(1):101, 2008.
- Intelligent agents for interactive simulation environments. AI Magazine, 16(1):15, 1995.
- Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291, 2023.
- Epidemic modeling with generative agents. arXiv preprint arXiv:2307.04986, 2023.
- The everyday life in the sims 4 during a pandemic. a life simulation as a virtual mirror of society? In INTED2021 Proceedings, pp. 5754–5760. IATED, 2021.
- Kola: Carefully benchmarking world knowledge of large language models. arXiv preprint arXiv:2306.09296, 2023.
- Building cooperative embodied agents modularly with large language models. In NeurIPS 2023 Foundation Models for Decision Making Workshop, 2023.
- Mindstorms in natural language-based societies of mind. arXiv preprint arXiv:2305.17066, 2023.