Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Monte Carlo Planning with Large Language Model for Text-Based Game Agents (2504.16855v1)

Published 23 Apr 2025 in cs.CL

Abstract: Text-based games provide valuable environments for language-based autonomous agents. However, planning-then-learning paradigms, such as those combining Monte Carlo Tree Search (MCTS) and reinforcement learning (RL), are notably time-consuming due to extensive iterations. Additionally, these algorithms perform uncertainty-driven exploration but lack language understanding and reasoning abilities. In this paper, we introduce the Monte Carlo planning with Dynamic Memory-guided LLM (MC-DML) algorithm. MC-DML leverages the language understanding and reasoning capabilities of LLMs alongside the exploratory advantages of tree search algorithms. Specifically, we enhance LLMs with in-trial and cross-trial memory mechanisms, enabling them to learn from past experiences and dynamically adjust action evaluations during planning. We conduct experiments on a series of text-based games from the Jericho benchmark. Our results demonstrate that the MC-DML algorithm significantly enhances performance across various games at the initial planning phase, outperforming strong contemporary methods that require multiple iterations. This demonstrates the effectiveness of our algorithm, paving the way for more efficient language-grounded planning in complex environments.

Monte Carlo Planning with LLM for Text-Based Game Agents

The paper introduces a novel approach for enhancing planning efficiency in text-based game agents, leveraging the capabilities of LLMs integrated with Monte Carlo Tree Search (MCTS). This is achieved through the development of the Monte Carlo planning with Dynamic Memory-guided LLM (MC-DML) algorithm, which capitalizes on the reasoning and language understanding capabilities inherent in LLMs to address challenges associated with traditional MCTS in uncertain environments like text-based games.

Introduction

Text-based games offer a rich environment for exploring NLP and sequential decision-making. Agents in these games navigate using textual commands and must contend with dynamic state spaces and sparse rewards. The traditional reinforcement learning (RL) and Monte Carlo Tree Search (MCTS) methods applied to these games face constraints due to their lack of in-depth language reasoning abilities and extensive computational requirements for iterative learning. This paper proposes enhancing these methods by integrating LLMs, which provide a robust framework for quick initial plan generation.

Methodology

The MC-DML algorithm is introduced to effectively model planning within text-based games as a Partially Observable Markov Decision Process (POMDP). This is built on integrating LLMs with dynamic memory mechanisms into MCTS planning. The algorithm unfolds in several phases: selection, expansion, simulation, and backpropagation. During expansion, the LLM acts as a prior policy within the PUCT framework, assigning search priorities to actions based on current state trajectory and accumulated memory from previous trials. This mechanism allows for real-time adaptation and learning from past failures using both in-trial and cross-trial memory.

A noteworthy aspect of the MC-DML approach is its ability to simulate a human player's decision-making process by mirroring short-term and long-term memory retention. This facilitates dynamic adjustments in action exploration and evaluation, contributing to more nuanced and strategically informed gameplay performance.

Experimentation and Results

Experiments across various text-based games from the Jericho benchmark illustrate the efficacy of the MC-DML algorithm. Results demonstrate that MC-DML significantly improves game scores at initial planning phases compared to contemporary methods requiring multiple iterations. The algorithm's dynamic memory dramatically enhances its decision-making capability, enabling it to surpass even the strongest game agents on challenging tasks like Zork1 and Deephome.

The paper provides a detailed comparative analysis with multiple baseline models, ranging from RL-based techniques to other MCTS and LLM-involving strategies. MC-DML exhibits superior performance by efficiently overcoming bottleneck states, which traditional methods struggle with.

Implications and Future Work

The integration of LLMs into MCTS planning opens novel pathways for advancing autonomous agents in text-based games and other complex environments. It implies broader applicability of language-based reasoning in AI planning tasks. The research encapsulates both a practical advancement—improving computational efficiency—and a theoretical insight—better understanding the potential of LLMs in dynamic and uncertain settings.

Future research could explore more versatile memory storage and retrieval mechanisms within LLMs to improve in-trial memory utilization. This paper lays the groundwork for further developments in strategic planning of AI agents, emphasizing the symbiotic relationship between language understanding and decision-making processes.

The MC-DML algorithm highlights the growing relevance of LLMs in adaptive planning and offers a compelling case for their expanded application beyond conventional language tasks, proving instrumental in broadening the horizon of AI exploration strategies.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Zijing Shi (7 papers)
  2. Meng Fang (100 papers)
  3. Ling Chen (144 papers)
Youtube Logo Streamline Icon: https://streamlinehq.com