Will GPT-4 Run DOOM? (2403.05468v1)
Abstract: We show that GPT-4's reasoning and planning capabilities extend to the 1993 first-person shooter Doom. This LLM is able to run and play the game with only a few instructions, plus a textual description--generated by the model itself from screenshots--about the state of the game being observed. We find that GPT-4 can play the game to a passable degree: it is able to manipulate doors, combat enemies, and perform pathing. More complex prompting strategies involving multiple model calls provide better results. While further work is required to enable the LLM to play the game as well as its classical, reinforcement learning-based counterparts, we note that GPT-4 required no training, leaning instead on its own reasoning and observational capabilities. We hope our work pushes the boundaries on intelligent, LLM-based agents in video games. We conclude by discussing the ethical implications of our work.
- OpenAI, “GPT-4 technical report,” OpenAI, Tech. Rep., 2023. [Online]. Available: https://arxiv.org/abs/2303.08774v2
- J. Lee, T. Le, J. Chen, and D. Lee, “Do language models plagiarize?” in Proceedings of the ACM Web Conference 2023, ser. WWW ’23. New York, NY, USA: Association for Computing Machinery, 2023, p. 3637–3647. [Online]. Available: https://doi.org/10.1145/3543507.3583199
- N. Carlini, D. Ippolito, M. Jagielski, K. Lee, F. Tramer, and C. Zhang, “Quantifying memorization across neural language models,” in International Conference on Learning Representations, 2023. [Online]. Available: https://openreview.net/forum?id=TatRHT_1cK
- A. de Wynter, X. Wang, A. Sokolov, Q. Gu, and S.-Q. Chen, “An evaluation on large language model outputs: Discourse and memorization,” Natural Language Processing Journal, vol. 4, p. 100024, 2023. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2949719123000213
- S. Chiet. (2022) DOOMpad. [Online]. Available: https://samperson.itch.io/doompad
- L. Ramlan. (2023) 1-Bit pixels encoded in E. Coli for the display of interactive digital media. [Online]. Available: https://docs.google.com/document/d/1SFm1dS6myqq7psBKttP7CVYN4jO66lOp7ZMA829c_hc
- V. Blake. (2020) You can now play DOOM in Minecraft. [Online]. Available: https://www.gamesradar.com/you-can-now-play-doom-in-minecraft/
- R. Ferdous, F. Kifetew, D. Prandi, and A. Susi, “Towards agent-based testing of 3D games using reinforcement learning,” in Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, ser. ASE ’22. New York, NY, USA: Association for Computing Machinery, 2023. [Online]. Available: https://doi.org/10.1145/3551349.3560507
- S. Alvernaz and J. Togelius, “Autoencoder-augmented neuroevolution for visual Doom playing,” in 2017 IEEE Conference on Computational Intelligence and Games (CIG), 2017, pp. 1–8.
- G. Lample and D. S. Chaplot, “Playing FPS games with deep reinforcement learning,” in Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, ser. AAAI’17. AAAI Press, 2017, p. 2140–2146.
- S. Song, J. Weng, H. Su, D. Yan, H. Zou, and J. Zhu, “Playing FPS games with environment-aware hierarchical reinforcement learning,” in Proceedings of the 28th International Joint Conference on Artificial Intelligence, ser. IJCAI’19. AAAI Press, 2019, p. 3475–3482.
- M. Wright, Y. Wang, and M. P. Wellman, “Iterated deep reinforcement learning in games: History-aware training for improved stability,” in Proceedings of the 2019 ACM Conference on Economics and Computation, ser. EC ’19. New York, NY, USA: Association for Computing Machinery, 2019, p. 617–636. [Online]. Available: https://doi.org/10.1145/3328526.3329634
- T. Liu, Z. Zheng, H. Li, K. Bian, and L. Song, “Playing card-based RTS games with deep reinforcement learning,” in Proceedings of the 28th International Joint Conference on Artificial Intelligence, ser. IJCAI’19. AAAI Press, 2019, p. 4540–4546.
- M. Zhou, Y. Chen, Y. Wen, Y. Yang, Y. Su, W. Zhang, D. Zhang, and J. Wang, “Factorized q-learning for large-scale multi-agent systems,” in Proceedings of the First International Conference on Distributed Artificial Intelligence, ser. DAI ’19. New York, NY, USA: Association for Computing Machinery, 2019. [Online]. Available: https://doi.org/10.1145/3356464.3357707
- J. Liu, D. Shen, Y. Zhang, B. Dolan, L. Carin, and W. Chen, “What makes good in-context examples for GPT-3?” in Proceedings of Deep Learning Inside Out (DeeLIO 2022): The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures, E. Agirre, M. Apidianaki, and I. Vulić, Eds. Dublin, Ireland and Online: Association for Computational Linguistics, May 2022, pp. 100–114. [Online]. Available: https://aclanthology.org/2022.deelio-1.10
- Y. Lu, M. Bartolo, A. Moore, S. Riedel, and P. Stenetorp, “Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity,” in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Dublin, Ireland: Association for Computational Linguistics, May 2022, pp. 8086–8098. [Online]. Available: https://aclanthology.org/2022.acl-long.556
- A. de Wynter and T. Yuan, ““I wish to have an argument!”: Argumentative reasoning in large language models,” ArXiv, vol. abs/2309.16938, 2023. [Online]. Available: https://arxiv.org/abs/2309.16938
- J. Wei, X. Wang, D. Schuurmans, M. Bosma, b. ichter, F. Xia, E. Chi, Q. V. Le, and D. Zhou, “Chain-of-thought prompting elicits reasoning in large language models,” in Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds., vol. 35. Curran Associates, Inc., 2022, pp. 24 824–24 837. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2022/file/9d5609613524ecf4f15af0f7b31abca4-Paper-Conference.pdf
- T. Kojima, S. S. Gu, M. Reid, Y. Matsuo, and Y. Iwasawa, “Large language models are zero-shot reasoners,” in Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds., vol. 35. Curran Associates, Inc., 2022, pp. 22 199–22 213. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2022/file/8bb0d291acd4acf06ef112099c16f326-Paper-Conference.pdf
- S. J. Han, K. J. Ransom, A. Perfors, and C. Kemp, “Human-like property induction is a challenge for large language models,” in Proceedings of the 44th Annual Conference of the Cognitive Science Society (CogSci 2022). Toronto, Canada: Annual Conference of the Cognitive Science Society (CogSci), 2022, pp. 1–8.
- N. Dziri, X. Lu, M. Sclar, X. L. Li, L. Jian, B. Y. Lin, P. West, C. Bhagavatula, R. L. Bras, J. D. Hwang, S. Sanyal, S. Welleck, X. Ren, A. Ettinger, Z. Harchaoui, and Y. Choi, “Faith and fate: Limits of transformers on compositionality,” ArXiv, vol. abs/2305.18654, 2023. [Online]. Available: https://arxiv.org/abs/2305.18654
- A. Saparov and H. He, “Language models are greedy reasoners: A systematic formal analysis of chain-of-thought,” in International Conference on Learning Representations (ICLR), 2023.
- C. Anil, Y. Wu, A. J. Andreassen, A. Lewkowycz, V. Misra, V. V. Ramasesh, A. Slone, G. Gur-Ari, E. Dyer, and B. Neyshabur, “Exploring length generalization in large language models,” in Advances in Neural Information Processing Systems, A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, Eds., 2022. [Online]. Available: https://openreview.net/forum?id=zSkYVeX7bC4
- M. Suzgun, N. Scales, N. Schärli, S. Gehrmann, Y. Tay, H. W. Chung, A. Chowdhery, Q. Le, E. Chi, D. Zhou, and J. Wei, “Challenging BIG-bench tasks and whether chain-of-thought can solve them,” in Findings of the Association for Computational Linguistics: ACL 2023. Toronto, Canada: Association for Computational Linguistics, Jul. 2023, pp. 13 003–13 051. [Online]. Available: https://aclanthology.org/2023.findings-acl.824
- J. Chen, S. Yuan, R. Ye, B. P. Majumder, and K. Richardson, “Put your money where your mouth is: Evaluating strategic planning and execution of LLM agents in an auction arena,” 2023. [Online]. Available: https://openreview.net/forum?id=crMMk4I8Wy
- W. Huang, P. Abbeel, D. Pathak, and I. Mordatch, “Language models as zero-shot planners: Extracting actionable knowledge for embodied agents,” in Proceedings of the 39th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvari, G. Niu, and S. Sabato, Eds., vol. 162. PMLR, 17–23 Jul 2022, pp. 9118–9147. [Online]. Available: https://proceedings.mlr.press/v162/huang22a.html
- B. M. Vishal Pallagani, K. Murugesan, F. Rossi, B. Srivastava, L. Horesh, F. Fabiano, and A. Loreggia, “Understanding the capabilities of large language models for automated planning,” ArXiv, vol. abs/2305.16151, 2023. [Online]. Available: https://arxiv.org/abs/2305.16151
- S. Sun, Y. Liu, S. Wang, C. Zhu, and M. Iyyer, “PEARL: Prompting large language models to plan and execute actions over long documents,” ArXiv, vol. abs/2305.14564, 2023. [Online]. Available: https://arxiv.org/abs/2305.14564
- S. Hao, Y. Gu, H. Ma, J. J. Hong, Z. Wang, D. Z. Wang, and Z. Hu, “Reasoning with language model is planning with world model,” ArXiv, vol. abs/2305.14992, 2023. [Online]. Available: https://arxiv.org/abs/2305.14992
- Y. Zhang, S. Mao, T. Ge, X. Wang, Y. Xia, M. Lan, and F. Wei, “K-Level reasoning with large language models,” ArXiv, vol. abs/2402.01521, 2024. [Online]. Available: https://arxiv.org/abs/2402.01521
- L. Wang, W. Xu, Y. Lan, Z. Hu, Y. Lan, R. K.-W. Lee, and E.-P. Lim, “Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models,” in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), A. Rogers, J. Boyd-Graber, and N. Okazaki, Eds. Toronto, Canada: Association for Computational Linguistics, Jul. 2023, pp. 2609–2634. [Online]. Available: https://aclanthology.org/2023.acl-long.147
- C. H. Song, J. Wu, C. Washington, B. M. Sadler, W.-L. Chao, and Y. Su, “LLM-Planner: Few-shot grounded planning for embodied agents with large language models,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2023, pp. 2998–3009.
- X. Liu, D. Yin, Y. Feng, and D. Zhao, “Things not written in text: Exploring spatial commonsense from visual signals,” in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), S. Muresan, P. Nakov, and A. Villavicencio, Eds. Dublin, Ireland: Association for Computational Linguistics, May 2022, pp. 2365–2376. [Online]. Available: https://aclanthology.org/2022.acl-long.168
- Y. Lu, W. Zhu, X. Wang, M. Eckstein, and W. Y. Wang, “Imagination-augmented natural language understanding,” in Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, M. Carpuat, M.-C. de Marneffe, and I. V. Meza Ruiz, Eds. Seattle, United States: Association for Computational Linguistics, Jul. 2022, pp. 4392–4402. [Online]. Available: https://aclanthology.org/2022.naacl-main.326
- Y. Yang, W. Yao, H. Zhang, X. Wang, D. Yu, and J. Chen, “Z-LaVI: Zero-shot language solver fueled by visual imagination,” in Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Y. Goldberg, Z. Kozareva, and Y. Zhang, Eds. Abu Dhabi, United Arab Emirates: Association for Computational Linguistics, Dec. 2022, pp. 1186–1203. [Online]. Available: https://aclanthology.org/2022.emnlp-main.78
- L. Zhang, Q. Chen, J. Siebert, and B. Tang, “Semi-supervised visual feature integration for language models through sentence visualization,” in Proceedings of the 2021 International Conference on Multimodal Interaction, ser. ICMI ’21. New York, NY, USA: Association for Computing Machinery, 2021, p. 682–686. [Online]. Available: https://doi.org/10.1145/3462244.3479965
- Y. Lu, P. Lu, Z. Chen, W. Zhu, X. E. Wang, and W. Y. Wang, “Multimodal procedural planning via dual text-image prompting,” ArXiv, vol. abs/2305.01795, 2023. [Online]. Available: https://arxiv.org/abs/2305.01795
- Q. Wang, M. Li, H. P. Chan, L. Huang, J. Hockenmaier, G. Chowdhary, and H. Ji, “Multimedia generative script learning for task planning,” in Findings of the Association for Computational Linguistics: ACL 2023, A. Rogers, J. Boyd-Graber, and N. Okazaki, Eds. Toronto, Canada: Association for Computational Linguistics, Jul. 2023, pp. 986–1008. [Online]. Available: https://aclanthology.org/2023.findings-acl.63
- Z. Wang, S. Cai, G. Chen, A. Liu, X. Ma, and Y. Liang, “Describe, explain, plan and select: Interactive planning with LLMs enables open-world multi-task agents,” in Thirty-seventh Conference on Neural Information Processing Systems, 2023. [Online]. Available: https://openreview.net/forum?id=KtvPdGb31Z
- S. Hu, T. Huang, and L. Liu, “PokéLLMon: A human-parity agent for Pokémon Battles with large language models,” ArXiv, vol. abs/2402.01118, 2024. [Online]. Available: https://arxiv.org/abs/2402.01118
- W. Ma, Q. Mi, X. Yan, Y. Wu, R. Lin, H. Zhang, and J. Wang, “Large language models play Starcraft II: Benchmarks and a chain of summarization approach,” ArXiv, vol. abs/2312.11865, 2024. [Online]. Available: https://arxiv.org/abs/2312.11865
- G. Aloupis, E. D. Demaine, A. Guo, and G. Viglietta, “Classic nintendo games are (computationally) hard,” Theoretical Computer Science, vol. 586, pp. 135–160, 2015, fun with Algorithms. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0304397515001735
- F. Brahman, V. Shwartz, R. Rudinger, and Y. Choi, “Learning to rationalize for nonmonotonic reasoning with distant supervision,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 14, pp. 12 592–12 601, May 2021. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/17492
- DoomWiki.org. E1m1: Hangar (doom). Accessed 7 February 2024. [Online]. Available: https://doomwiki.org/wiki/E1M1:_Hangar_(Doom)
- A. Kanervisto, J. Pussinen, and V. Hautamäki, “Benchmarking end-to-end behavioural cloning on video games,” in 2020 IEEE Conference on Games (CoG), 2020, pp. 558–565.
- S. Milani, A. Juliani, I. Momennejad, R. Georgescu, J. Rzepecki, A. Shaw, G. Costello, F. Fang, S. Devlin, and K. Hofmann, “Navigates like me: Understanding how people evaluate human-like AI in video games,” in Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, ser. CHI ’23. New York, NY, USA: Association for Computing Machinery, 2023. [Online]. Available: https://doi.org/10.1145/3544548.3581348
- P. Anderson, A. Chang, D. S. Chaplot, A. Dosovitskiy, S. Gupta, V. Koltun, J. Kosecka, J. Malik, R. Mottaghi, M. Savva, and A. R. Zamir, “On evaluation of embodied navigation agents,” ArXiv, vol. abs/1807.06757, 2018. [Online]. Available: https://arxiv.org/abs/1807.06757
- D. S. Chaplot, D. Gandhi, A. Gupta, and R. Salakhutdinov, “Object goal navigation using goal-oriented semantic exploration,” in Proceedings of the 34th International Conference on Neural Information Processing Systems, ser. NIPS’20. Red Hook, NY, USA: Curran Associates Inc., 2020.
- S. M. Goodman, E. Buehler, P. Clary, A. Coenen, A. Donsbach, T. N. Horne, M. Lahav, R. MacDonald, R. B. Michaels, A. Narayanan, M. Pushkarna, J. Riley, A. Santana, L. Shi, R. Sweeney, P. Weaver, A. Yuan, and M. R. Morris, “LaMPost: Design and evaluation of an AI-assisted email writing prototype for adults with dyslexia,” in Proceedings of the 24th International ACM SIGACCESS Conference on Computers and Accessibility, ser. ASSETS ’22. New York, NY, USA: Association for Computing Machinery, 2022. [Online]. Available: https://doi.org/10.1145/3517428.3544819
- M. Wydmuch, M. Kempka, and W. Jaśkowski, “ViZDoom Competitions: Playing Doom from Pixels,” IEEE Transactions on Games, vol. 11, no. 3, pp. 248–259, 2019.
- H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale, D. Bikel, L. Blecher, C. C. Ferrer, M. Chen, G. Cucurull, D. Esiobu, J. Fernandes, J. Fu, W. Fu, B. Fuller, C. Gao, V. Goswami, N. Goyal, A. Hartshorn, S. Hosseini, R. Hou, H. Inan, M. Kardas, V. Kerkez, M. Khabsa, I. Kloumann, A. Korenev, P. S. Koura, M.-A. Lachaux, T. Lavril, J. Lee, D. Liskovich, Y. Lu, Y. Mao, X. Martinet, T. Mihaylov, P. Mishra, I. Molybog, Y. Nie, A. Poulton, J. Reizenstein, R. Rungta, K. Saladi, A. Schelten, R. Silva, E. M. Smith, R. Subramanian, X. E. Tan, B. Tang, R. Taylor, A. Williams, J. X. Kuan, P. Xu, Z. Yan, I. Zarov, Y. Zhang, A. Fan, M. Kambadur, S. Narang, A. Rodriguez, R. Stojnic, S. Edunov, and T. Scialom, “Llama 2: Open foundation and fine-tuned chat models,” ArXiv, vol. abs/2307.09288v2, 2023. [Online]. Available: https://arxiv.org/abs/2307.09288v2