CivRealm: A Learning and Reasoning Odyssey in Civilization for Decision-Making Agents (2401.10568v2)
Abstract: The generalization of decision-making agents encompasses two fundamental elements: learning from past experiences and reasoning in novel contexts. However, the predominant emphasis in most interactive environments is on learning, often at the expense of complexity in reasoning. In this paper, we introduce CivRealm, an environment inspired by the Civilization game. Civilization's profound alignment with human history and society necessitates sophisticated learning, while its ever-changing situations demand strong reasoning to generalize. Particularly, CivRealm sets up an imperfect-information general-sum game with a changing number of players; it presents a plethora of complex features, challenging the agent to deal with open-ended stochastic environments that require diplomacy and negotiation skills. Within CivRealm, we provide interfaces for two typical agent types: tensor-based agents that focus on learning, and language-based agents that emphasize reasoning. To catalyze further research, we present initial results for both paradigms. The canonical RL-based agents exhibit reasonable performance in mini-games, whereas both RL- and LLM-based agents struggle to make substantial progress in the full game. Overall, CivRealm stands as a unique learning and reasoning challenge for decision-making agents. The code is available at https://github.com/bigai-ai/civrealm.
- Imitating interactive intelligence. arXiv preprint arXiv:2012.05672, 2020.
- Avalon: A benchmark for rl generalization using procedurally generated worlds. Advances in Neural Information Processing Systems, 35:12813–12825, 2022.
- Hindsight experience replay. Advances in neural information processing systems, 30, 2017.
- Anyscale, Inc. Ray: Productionizing and scaling python ml workloads simply, 2023. URL https://www.ray.io/.
- The hanabi challenge: A new frontier for ai research. Artificial Intelligence, 280:103216, 2020.
- The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253–279, 2013.
- The reversal curse: Llms trained on "a is b" fail to learn "b is a". arXiv preprint arXiv:2309.12288, 2023.
- Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680, 2019.
- On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
- Heads-up limit hold’em poker is solved. Science, 347(6218):145–149, 2015.
- Non-linear monte-carlo search in civilization ii. In AAAI Press/International Joint Conferences on Artificial Intelligence, 2011.
- Learning to win by reading manuals in a monte-carlo framework. Journal of Artificial Intelligence Research, 43:661–704, 2012.
- A survey of monte carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in games, 4(1):1–43, 2012.
- On the utility of learning about humans for human-ai coordination. Advances in neural information processing systems, 32, 2019.
- Smacv2: An improved benchmark for cooperative multi-agent reinforcement learning. arXiv preprint arXiv:2212.07489, 2022.
- Diversity is all you need: Learning skills without a reward function. arXiv preprint arXiv:1802.06070, 2018.
- Human-level play in the game of diplomacy by combining language models with strategic reasoning. Science, 378(6624):1067–1074, 2022.
- Minedojo: Building open-ended embodied agents with internet-scale knowledge. In Advances in Neural Information Processing Systems, 2022.
- Eugene S Ferguson. The origins of the steam engine. Scientific American, 210(1):98–107, 1964.
- Shortcut learning in deep neural networks. Nature Machine Intelligence, 2(11):665–673, 2020.
- John AJ Gowlett. The discovery of fire by humans: a long and convoluted process. Philosophical Transactions of the Royal Society B: Biological Sciences, 371(1696):20150164, 2016.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pp. 1861–1870. PMLR, 2018.
- Dream to control: Learning behaviors by latent imagination. arXiv preprint arXiv:1912.01603, 2019.
- Mastering atari with discrete world models. arXiv preprint arXiv:2010.02193, 2020.
- Deep hierarchical planning from pixels. Advances in Neural Information Processing Systems, 35:26091–26104, 2022.
- Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104, 2023.
- Image translation as diffusion visual programmers. International Conference on Learning Representations, 2024.
- Temporal difference learning for model predictive control. arXiv preprint arXiv:2203.04955, 2022.
- Reasoning with language model is planning with world model. arXiv preprint arXiv:2305.14992, 2023.
- Douglas Rayner Hartree. The eniac, an electronic computing machine. Nature, 158(4015):500–506, 1946.
- Metagpt: Meta programming for multi-agent collaborative framework. arXiv preprint arXiv:2308.00352, 2023.
- Jie Huang and Kevin Chen-Chuan Chang. Towards reasoning in large language models: A survey. arXiv preprint arXiv:2212.10403, 2022.
- Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In International Conference on Machine Learning, 2022.
- Inner monologue: Embodied reasoning through planning with language models. In Conference on Robot Learning, pp. 1769–1782. PMLR, 2023.
- Hierarchical reinforcement learning: A survey and open research challenges. Machine Learning and Knowledge Extraction, 4(1):172–221, 2022.
- When to trust your model: Model-based policy optimization. Advances in neural information processing systems, 32, 2019.
- The malmo platform for artificial intelligence experimentation. In Ijcai, pp. 4246–4247, 2016.
- Obstacle tower: a generalization challenge in vision, control, and planning. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, pp. 2684–2691, 2019.
- Unity: A general platform for intelligent agents. arXiv preprint arXiv:1809.02627, 2020.
- Morel: Model-based offline reinforcement learning. Advances in neural information processing systems, 33:21810–21823, 2020.
- Adam: A method for stochastic optimization. International Conference on Learning Representations, 12 2014.
- Google research football: A novel reinforcement learning environment. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pp. 4501–4510, 2020.
- Scalable evaluation of multi-agent reinforcement learning with melting pot. In International conference on machine learning, pp. 6187–6199. PMLR, 2021.
- Behavior-1k: A benchmark for embodied ai with 1,000 everyday activities and realistic simulation. In Conference on Robot Learning, pp. 80–93. PMLR, 2023a.
- Camel: Communicative agents for" mind" exploration of large language model society. In Thirty-seventh Conference on Neural Information Processing Systems, 2023b.
- Semantically aligned task decomposition in multi-agent reinforcement learning. arXiv preprint arXiv:2305.10865, 2023c.
- Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
- Self-refine: Iterative refinement with self-feedback. arXiv preprint arXiv:2303.17651, 2023.
- Model-based reinforcement learning: A survey. Foundations and Trends® in Machine Learning, 16(1):1–118, 2023.
- Emergence of grounded compositional language in multi-agent populations. In Proceedings of the AAAI conference on artificial intelligence, 2018.
- No-press diplomacy: Modeling multi-agent gameplay. Advances in Neural Information Processing Systems, 32, 2019.
- Generative agents: Interactive simulacra of human behavior. arXiv preprint arXiv:2304.03442, 2023.
- Mastering the game of stratego with model-free multiagent reinforcement learning. Science, 378(6623):990–996, 2022.
- Virtualhome: Simulating household activities via programs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8494–8502, 2018.
- Watch-and-help: A challenge for social perception and human-ai collaboration. arXiv preprint arXiv:2010.09890, 2020.
- Communicative agents for software development. arXiv preprint arXiv:2307.07924, 2023.
- Christian Rockstroh. Freeciv-bot, 2023. URL https://github.com/chris1869/freeciv-bot.
- The starcraft multi-agent challenge. arXiv preprint arXiv:1902.04043, 2019.
- Habitat: A platform for embodied ai research. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 9339–9347, 2019.
- Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588(7839):604–609, 2020.
- Trust region policy optimization. In International conference on machine learning, pp. 1889–1897. PMLR, 2015.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657, 2019.
- World of bits: An open-domain platform for web-based agents. In International Conference on Machine Learning, pp. 3135–3144. PMLR, 2017.
- Reflexion: an autonomous agent with dynamic memory and self-reflection. arXiv preprint arXiv:2303.11366, 2023.
- Alfred: A benchmark for interpreting grounded instructions for everyday tasks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10740–10749, 2020.
- Learning off-policy with online planning. In Conference on Robot Learning, pp. 1622–1633. PMLR, 2022.
- Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484–489, 2016.
- Mastering the game of go without human knowledge. nature, 550(7676):354–359, 2017.
- Neural mmo: A massively multiagent game environment for training and evaluating intelligent agents. arXiv preprint arXiv:1903.00784, 2019.
- Cognitive architectures for language agents. arXiv preprint arXiv:2309.02427, 2023.
- The AutoGPT team. Autogpt: the heart of the open-source agent ecosystem. https://github.com/Significant-Gravitas/AutoGPT, 2023.
- The FCIV-NET project. Fciv-net: the 3d version of the freeciv strategy game, 2023. URL https://github.com/fciv-net/fciv-net.
- The Freeciv project. Freeciv: An empire-building strategy game, 2023. URL http://freeciv.org/.
- The Freeciv-web project. Freeciv-web: An open-source turn-based strategy game, 2023. URL https://github.com/freeciv/freeciv-web.
- Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033. IEEE, 2012. doi: 10.1109/IROS.2012.6386109.
- Large language models still can’t plan (a benchmark for llms on planning and reasoning about change). In NeurIPS 2022 Foundation Models for Decision Making Workshop, 2022.
- Discovery of options via meta-learned subgoals. Advances in Neural Information Processing Systems, 34:29861–29873, 2021.
- Starcraft ii: A new challenge for reinforcement learning. arXiv preprint arXiv:1708.04782, 2017.
- Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature, 575(7782):350–354, 2019.
- Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291, 2023a.
- Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents. arXiv preprint arXiv:2302.01560, 2023b.
- The HongLongToBeat Website. Sid meierś civilization vi. https://howlongtobeat.com/game/37867, 2023.
- An overview of multi-agent reinforcement learning from game theoretical perspective. arXiv preprint arXiv:2011.00583, 2020.
- React: Synergizing reasoning and acting in language models. In The Eleventh International Conference on Learning Representations, 2022.
- Mopo: Model-based offline policy optimization. Advances in Neural Information Processing Systems, 33:14129–14142, 2020.
- ProAgent: Building proactive cooperative AI with large language models. arXiv preprint arXiv:2308.11339, 2023a.
- Large language model is semi-parametric reinforcement learning agent. arXiv preprint arXiv:2306.07929, 2023b.
- Building cooperative embodied agents modularly with large language models. arXiv preprint arXiv:2307.02485, 2023c.
- Multi-agent reinforcement learning: A selective overview of theories and algorithms. Handbook of reinforcement learning and control, pp. 321–384, 2021.
- Siren’s song in the ai ocean: A survey on hallucination in large language models. arXiv preprint arXiv:2309.01219, 2023d.
- Expel: Llm agents are experiential learners. arXiv preprint arXiv:2308.10144, 2023.
- Ghost in the minecraft: Generally capable agents for open-world enviroments via large language models with text-based knowledge and memory. arXiv preprint arXiv:2305.17144, 2023.
- Siyuan Qi (34 papers)
- Shuo Chen (127 papers)
- Yexin Li (10 papers)
- Xiangyu Kong (28 papers)
- Junqi Wang (14 papers)
- Bangcheng Yang (2 papers)
- Pring Wong (2 papers)
- Yifan Zhong (13 papers)
- Xiaoyuan Zhang (57 papers)
- Zhaowei Zhang (25 papers)
- Nian Liu (74 papers)
- Wei Wang (1793 papers)
- Yaodong Yang (169 papers)
- Song-Chun Zhu (216 papers)