Papers
Topics
Authors
Recent
2000 character limit reached

PokeLLMon: A Human-Parity Agent for Pokemon Battles with Large Language Models (2402.01118v3)

Published 2 Feb 2024 in cs.AI and cs.CL

Abstract: We introduce PokeLLMon, the first LLM-embodied agent that achieves human-parity performance in tactical battle games, as demonstrated in Pokemon battles. The design of PokeLLMon incorporates three key strategies: (i) In-context reinforcement learning that instantly consumes text-based feedback derived from battles to iteratively refine the policy; (ii) Knowledge-augmented generation that retrieves external knowledge to counteract hallucination and enables the agent to act timely and properly; (iii) Consistent action generation to mitigate the panic switching phenomenon when the agent faces a powerful opponent and wants to elude the battle. We show that online battles against human demonstrates PokeLLMon's human-like battle strategies and just-in-time decision making, achieving 49% of win rate in the Ladder competitions and 56% of win rate in the invited battles. Our implementation and playable battle logs are available at: https://github.com/git-disl/PokeLLMon.

Citations (3)

Summary

  • The paper introduces PokeLLMon, an LLM-based agent that achieves human parity in Pokémon battles through in-context reinforcement learning.
  • It employs knowledge-augmented generation and a consistent action voting mechanism to reduce hallucinations and erratic moves.
  • Empirical results show win rates of 49% in Ladder competitions and 56% in invited battles, underscoring its strategic effectiveness.

Introduction

The emergence of LLMs presents intriguing possibilities beyond natural language processing tasks, stretching into the field of gameplay where strategic thinking and decision-making are paramount. Developed by researchers at the Georgia Institute of Technology, POKE´LLM ON represents a novel foray into this interdisciplinary field, bounding ahead as the first LLM-embodied agent to achieve human parity in tactical battle games, specifically in the context of Pokémon battles.

Key Strategies in POKE´LLM ON

What sets POKE´LLM ON apart are its foundational strategies designed to tackle inherent challenges faced by LLMs in strategic gameplay. The agent deploys in-context reinforcement learning (ICRL), a method that leverages immediate text-based feedback from game outcomes to refine its action strategy in real-time without additional training. This immediate feedback acts as a form of "reward," a concept borrowed from reinforcement learning, albeit in text form, which is more congruent with LLM's strengths.

In the struggle against a tendency known as 'hallucination', where an agent might persist with ineffective moves or misjudge type advantages, knowledge-augmented generation (KAG) steps in. Utilizing a form of external database akin to the Pokémon games' Pokédex, KAG effectively reduces hallucinations by providing the agent with essential information about Pokémon types, move effects, and abilities.

There's also a phenomenon described as 'panic switching', where agents react inconsistently under pressure by making erratic changes in their choice of Pokémon. The paper addresses this issue with a technique called consistent action generation, which reduces the likelihood of such erratic behavior by adopting a voting mechanism across multiple independent action predictions to determine the most stable course of action.

Empirical Findings & Performance Metrics

POKE´LLM ON was put to the test in online battles against human players, navigating the complexities of real-time gameplay. The empirical results are notable: the agent managed a 49% win rate in Ladder competitions and an even higher 56% win rate in invited battles. The research showcases POKE´LLM ON's capability to strategize like a human player, dynamically adjusting its approach in response to the unfolding context of each battle.

Challenges and Future Directions

While POKE´LLM ON stands as an embodiment of LLM prowess in strategic games, the research also lays bare the agent's susceptibility to advanced human strategies, like attrition and deception. Human-like attrition tactics involve surviving and outlasting the opponent, a concept that the agent has yet to master fully. Similarly, human players can successfully employ deceptive maneuvers that can lead the agent astray, a challenge that may necessitate future models to account for opponent behavior prediction.

Conclusion

The POKE´LLM ON paper marks a significant step forward in blending the capabilities of LLMs with the strategic, rule-based world of gaming. By introducing techniques like ICRL, KAG, and consistent action generation, the researchers have chiseled a pathway toward creating AI that doesn't merely react but strategizes with a nuanced understanding of its environment. These innovations may soon bridge the gap between AI and human cognition in complex decision-making arenas such as gaming.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 8 tweets with 511 likes about this paper.