Generating Code World Models with Large Language Models Guided by Monte Carlo Tree Search (2405.15383v2)
Abstract: In this work we consider Code World Models, world models generated by a LLM in the form of Python code for model-based Reinforcement Learning (RL). Calling code instead of LLMs for planning has potential to be more precise, reliable, interpretable, and extremely efficient. However, writing appropriate Code World Models requires the ability to understand complex instructions, to generate exact code with non-trivial logic and to self-debug a long program with feedback from unit tests and environment trajectories. To address these challenges, we propose Generate, Improve and Fix with Monte Carlo Tree Search (GIF-MCTS), a new code generation strategy for LLMs. To test our approach in an offline RL setting, we introduce the Code World Models Benchmark (CWMB), a suite of program synthesis and planning tasks comprised of 18 diverse RL environments paired with corresponding textual descriptions and curated trajectories. GIF-MCTS surpasses all baselines on the CWMB and two other benchmarks, and we show that the Code World Models synthesized with it can be successfully used for planning, resulting in model-based RL agents with greatly improved sample efficiency and inference speed.
- Do as i can, not as i say: Grounding language in robotic affordances. In Conference on Robot Learning, 2022.
- The arcade learning environment: an evaluation platform for general agents. 47(1):253–279, may 2013. ISSN 1076-9757.
- Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc., 2020.
- Genie: Generative interactive environments, 2024.
- Codet: Code generation with generated tests, 2022.
- Evaluating large language models trained on code, 2021.
- Reader: Model-based language-instructed reinforcement learning. In Houda Bouamor, Juan Pino, and Kalika Bali, editors, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 16583–16599, Singapore, December 2023. Association for Computational Linguistics.
- Autumnsynth: Synthesis of reactive programs with structured latent state. In Advances in Programming Languages and Neurosymbolic Systems Workshop, 2021.
- PDDL, The Planning Domain Definition Language, 1998.
- CRITIC: Large language models can self-correct with tool-interactive critiquing. In The Twelfth International Conference on Learning Representations, 2024.
- Leveraging pre-trained large language models to construct and utilize world models for model-based task planning. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems, volume 36, pages 79081–79094. Curran Associates, Inc., 2023.
- Photorealistic video generation with diffusion models, 2023.
- World models. 2018.
- Mastering atari with discrete world models. In International Conference on Learning Representations, 2021.
- Reasoning with language model is planning with world model. In The 2023 Conference on Empirical Methods in Natural Language Processing, 2023.
- Measuring coding challenge competence with apps. NeurIPS, 2021.
- Bc-z: Zero-shot task generalization with robotic imitation learning. In Conference on Robot Learning, pages 991–1002. PMLR, 2022.
- Language-driven representation learning for robotics, 2023.
- The multi-armed bandit problem: Decomposition and computation. Mathematics of Operations Research, 12(2):262–268, 1987. doi: 10.1287/moor.12.2.262.
- Bandit based monte-carlo planning. In Johannes Fürnkranz, Tobias Scheffer, and Myra Spiliopoulou, editors, Machine Learning: ECML 2006, pages 282–293, Berlin, Heidelberg, 2006. Springer Berlin Heidelberg.
- Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213, 2022.
- Coderl: Mastering code generation through pretrained models and deep reinforcement learning. Advances in Neural Information Processing Systems, 35:21314–21328, 2022.
- Offline reinforcement learning: Tutorial, review, and perspectives on open problems, 2020.
- Learning to model the world with language, 2023.
- World model on million-length video and language with blockwise ringattention, 2024.
- Self-refine: Iterative refinement with self-feedback. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- Transformers are sample-efficient world models. In The Eleventh International Conference on Learning Representations, 2023.
- Grounding predicates through actions. In 2022 International Conference on Robotics and Automation (ICRA), pages 3498–3504, 2022.
- Grounded sam: Assembling open-world models for diverse visual tasks, 2024.
- Reuven Y Rubinstein. Optimization of computer simulation models with rare events. European Journal of Operational Research, 99(1):89–112, 1997.
- Reflexion: language agents with verbal reinforcement learning. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- Richard S. Sutton. Dyna, an integrated architecture for learning, planning, and reacting. SIGART Bull., 2(4):160–163, jul 1991. ISSN 0163-5719.
- Worldcoder, a model-based llm agent: Building world models by writing code and interacting with the environment, 2024.
- William R Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3-4):285–294, 1933.
- Gymnasium, March 2023.
- Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
- Chain of thought prompting elicits reasoning in large language models. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022.
- From word models to world models: Translating from natural language to the probabilistic language of thought, 2023a.
- Learning adaptive planning representations with natural language guidance. CoRR, abs/2312.08566, 2023b.
- Learning interactive real-world simulators. In The Twelfth International Conference on Learning Representations, 2024.
- Tree of thoughts: Deliberate problem solving with large language models. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- Parsel: Algorithmic reasoning with language models by composing decompositions. Advances in Neural Information Processing Systems, 36:31466–31523, 2023.
- Language-guided world models: A model-based approach to ai control, 2024.
- Planning with large language models for code generation. In The Eleventh International Conference on Learning Representations, 2023.
- RTFM: Generalising to new environment dynamics via reading. In International Conference on Learning Representations, 2020.
- Language agent tree search unifies reasoning acting and planning in language models, 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.