Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree Search (2408.10635v2)

Published 20 Aug 2024 in cs.AI and cs.CL

Abstract: In this paper, we propose a new method STRATEGIST that utilizes LLMs to acquire new skills for playing multi-agent games through a self-improvement process. Our method gathers quality feedback through self-play simulations with Monte Carlo tree search and LLM-based reflection, which can then be used to learn high-level strategic skills such as how to evaluate states that guide the low-level execution. We showcase how our method can be used in both action planning and dialogue generation in the context of games, achieving good performance on both tasks. Specifically, we demonstrate that our method can help train agents with better performance than both traditional reinforcement learning-based approaches and other LLM-based skill learning approaches in games including the Game of Pure Strategy (GOPS) and The Resistance: Avalon. STRATEGIST helps bridge the gap between foundation models and symbolic decision-making methods through its bi-level approach, leading to more robust decision-making.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Jonathan Light (9 papers)
  2. Min Cai (14 papers)
  3. Weiqin Chen (11 papers)
  4. Guanzhi Wang (14 papers)
  5. Xiusi Chen (36 papers)
  6. Wei Cheng (175 papers)
  7. Yisong Yue (154 papers)
  8. Ziniu Hu (51 papers)
Citations (1)

Summary

  • The paper introduces a novel bi-level framework that couples LLMs with MCTS to improve strategic decision-making in multi-agent games.
  • It employs LLM-guided reflections and evolutionary self-play to iteratively refine high-level strategies into actionable policies.
  • Experimental results on GOPS and Avalon highlight superior performance over traditional RL approaches with reduced computational resources.

Overview of "Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree Search"

The paper "Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree Search" presents a bi-level framework designed to enhance strategic decision-making through the integration of LLMs and symbolic decision-making methods. The proposed method, Strategist, involves a self-improvement process whereby LLMs acquire new skills to improve performance in multi-agent games. This is achieved through a combination of LLM-guided reflections and Monte Carlo Tree Search (MCTS) to generate high-level strategic abstractions that are refined into executable policies.

Methodological Insights

Strategist operates by constructing strategy trees through evolutionary processes, iteratively refining strategies via LLMs that generate feedback based on self-play simulations. The bi-level framework divides strategy learning into a high level, where it develops interpretable strategic abstractions, and a low level, where MCTS is employed to refine these abstractions into actionable policies. This approach is adaptive and does not depend on large amounts of training data, leveraging the generalization capabilities inherent in LLMs and the precise planning afforded by MCTS.

The framework introduces a modular method of strategy improvement using population-based self-play and a queue of potential improvement ideas. This process aids in escaping local maxima by incrementally implementing and evaluating ideas within strategic contexts. The comparison of different strategies during self-play facilitates a robust evolutionary approach, enabling the selection of optimal strategies based on performance metrics.

Experimental Results

The experimental validation of Strategist was conducted on two complex games, namely the Game of Pure Strategy (GOPS) and Resistance: Avalon. These games necessitate both strategic complexity and natural language interaction, offering a robust testing ground for the proposed method. Strategist demonstrated superior performance compared to traditional reinforcement learning (RL) methods, such as AlphaGo-like approaches, by using substantially fewer computational resources. This highlights the efficacy of the bi-level framework in efficiently exploring and refining strategies without extensive data or computations.

In GOPS, a card game recognized for its combinatorial depth, Strategist outperformed traditional deep RL approaches, demonstrating improved policy learning through its strategic abstraction process. Similarly, in Avalon, a multi-agent role-playing game characterized by complex social deduction and interaction dynamics, Strategist achieved better game play outcomes compared to baselines. Importantly, Strategist learned robust dialogue strategies motivated by structured chain-of-thought processes, which were used to guide in-game discussions effectively.

Implications and Future Directions

The implications of Strategist extend beyond game play into broader domains requiring strategic decision-making. By integrating LLMs with symbolic methods, the framework suggests a pathway toward more autonomous and generalized AI systems capable of strategic reasoning across varied applications. This could potentially lead to advancements in negotiation, collaborative tasks, and complex planning scenarios.

Future developments might extend the functionality of Strategist to incorporate more advanced forms of environmental interaction and adversarial settings, possibly involving more nuanced dialogues and complex reasoning tasks. Additionally, further exploration into different high-level abstractions and their influences on policy refinement could uncover new strategic insights.

Overall, "Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree Search" presents a compelling advancement in leveraging LLMs for strategic reasoning, showcasing robust performance in multi-agent environments and suggesting promising avenues for AI research and applications.

Youtube Logo Streamline Icon: https://streamlinehq.com