Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
86 tokens/sec
GPT-4o
11 tokens/sec
Gemini 2.5 Pro Pro
53 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
3 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
2000 character limit reached

LLM-empowered MMOAgent

Updated 11 July 2025
  • LLM-empowered MMOAgent is a modular multi-agent framework that employs large language models for adaptive decision-making and complex social interactions in MMO settings.
  • The system integrates memory, analysis, planning, and action modules to simulate human-like collaboration, leadership, persuasion, and deception.
  • Adaptive evaluation metrics and iterative learning cycles enable agents to refine strategies and optimize performance in dynamic multiplayer environments.

An LLM-empowered MMOAgent refers to a LLM-based autonomous agent or society of agents designed for massively multiplayer online (MMO) settings, capable of complex social, communicative, and strategic behaviors. Such agents leverage the reasoning, memory, planning, and adaptive decision-making capabilities of LLMs to operate in dynamic multi-agent systems, as explored through detailed frameworks in interactive environments such as Avalon gameplay (Lan et al., 2023).

1. Modular Framework and Agent Architecture

The LLM-empowered MMOAgent is structured as a modular multi-agent system enabling human-like decision-making and intricate social interaction. Each agent is driven by a role-specific system prompt, encompassing information such as agent role, winning objectives, and high-level strategies. The system is organized into several modules:

  • Memory Storage and Summarization: Agents systematically record, summarize, and retain interaction histories. The memory state at time tt, for agent pip_i, is maintained according to:

Mt=SMR(Mt1),(Rt(p1),,Rt(p6),It)M_t = \langle \text{SMR}(M_{t-1}), (R_t^{(p_1)}, \ldots, R_t^{(p_6)}, I_t) \rangle

where SMR is a summarization function, Rt(pj)R_t^{(p_j)} is the response of agent pjp_j at time tt, and ItI_t is relevant information from the environment.

  • Analysis Module: Agents analyze their memory and scenario-specific role information to formulate hypotheses regarding others’ identities and strategies:

Ht(pi)=ANA(Mt,RI(pi))H_t^{(p_i)} = \text{ANA}(M_t, RI^{(p_i)})

Here, RI(pi)RI^{(p_i)} denotes role information.

  • Planning Module: Each agent adapts their strategic plans based on memory, analysis, prior plans, objectives, and game state:

Pt(pi)=PLAN(Mt,Ht(pi),Pt1(pi),RI(pi),G(pi),S(pi))P_t^{(p_i)} = \text{PLAN}(M_t, H_t^{(p_i)}, P_{t-1}^{(p_i)}, RI^{(p_i)}, G^{(p_i)}, S^{(p_i)})

  • Action Module: Agents select actions (e.g., proposing teams, voting, quest participation) by sampling from a probability distribution conditioned on all prior modules:

At(pi)p(AMt,Ht(pi),Pt(pi),RI(pi),G(pi),S(pi),It)A_t^{(p_i)} \sim p(A | M_t, H_t^{(p_i)}, P_t^{(p_i)}, RI^{(p_i)}, G^{(p_i)}, S^{(p_i)}, I'_t)

  • Response Generation and Communication: Agents generate text responses for negotiation, persuasion, deception, or debate, supporting rich social interactions.

This modular composition underpins efficient communication, clear separation of concerns, and flexibility in handling diverse MMO tasks.

2. Social Behavior Modelling: Collaboration and Confrontation

LLM-empowered MMOAgents are explicitly engineered to simulate both constructive and adversarial behaviors in socially complex environments:

  • Teamwork: Agents (especially those on the 'good' side) share information, build trust, and coordinate actions to maximize collective utility.
  • Leadership: Measured by metrics like the Leader Approval Rate, some agents emerge as leaders, steering discourse and group action.
  • Persuasion: Agents deploy self-recommendation or influence strategies to shape team formation, with corresponding success tracked quantitatively.
  • Deception and Camouflage: 'Evil' side agents, and pressured 'good' agents, may conceal intentions or provide misleading cues, with their frequency and timing tracked (e.g., self-disclosure, camouflage metrics).
  • Confrontation and Debate: The agents engage in objections, suspicion-driven exchanges, and other adversarial dialogues reflecting distrust and strategy.

Dialogues and actions are parsed and classified (with the aid of external LLMs like ChatGPT) to objectively evaluate occurrences of collaboration, confrontation, and information sharing.

3. Evaluation Metrics and Adaptive Performance

Agent performance is measured through both direct gameplay outcomes and proxies for intelligent/social behavior:

  • Winning Rate (WR): Proportion of games won by an agent or side.
  • Quest Engagement Rate (QER): Percentage of rounds involving successful quest participation.
  • Failure Vote Rate (FVR): Proportion of votes explicitly cast to sabotage or fail team actions.
  • Social Behavior Metrics: Including Leader Approval Rate, self-recommendation rate, success rate of proposals, and quantitative analysis of deception or confrontation.

Evaluations are performed by parsing low-level logs as well as employing LLM-based classifiers on communication records. Such multi-level metrics enable both quantitative and qualitative assessment of agent adaptability, influence, and strategic depth.

4. Mechanisms for Agent Adaptation and Experience Learning

Adaptivity in the MMOAgent is achieved through cyclical integration of observation, memory, analysis, planning, and self-improvement:

  • The memory module compresses interaction history via summarization, ensuring relevance and scalability as the game context evolves.
  • The experience learning component aggregates both self-assessment and opponent modeling; agents iteratively refine strategies by learning from both their own and adversarial behaviors.
  • Planning is dynamically updated in light of observed outcomes and accumulated knowledge, allowing agents to adjust to evolving game states and group dynamics.

Formally, this adaptivity is characterized by the interlinked update cycle:

Mt=SMR(Mt1),(Rt(p1),,Rt(p6),It) Ht(pi)=ANA(Mt,RI(pi)) Pt(pi)=PLAN(Mt,Ht(pi),Pt1(pi),RI(pi),G(pi),S(pi))\begin{align*} M_t &= \langle \text{SMR}(M_{t-1}), (R_t^{(p_1)}, \ldots, R_t^{(p_6)}, I_t) \rangle \ H_t^{(p_i)} &= \text{ANA}(M_t, RI^{(p_i)}) \ P_t^{(p_i)} &= \text{PLAN}(M_t, H_t^{(p_i)}, P_{t-1}^{(p_i)}, RI^{(p_i)}, G^{(p_i)}, S^{(p_i)}) \end{align*}

Actions are drawn from a context-sensitive distribution, and repeated cycles enable the agent to self-correct and strategically innovate.

5. Applications and Generalization to Online Environments

While the described framework is instantiated in the context of Avalon, its abstraction and modularity lend themselves to broader MMO contexts:

  • Social Virtual Worlds: Agents simulate negotiation, alliance formation, and betrayal, supporting research into emergent social structures in online environments.
  • Online Game AI: The framework can underpin NPCs or agents that meaningfully navigate games characterized by incomplete information, dynamic alliances, and real-time interaction.
  • Collaborative and Adversarial Task Scenarios: Modules for memory, analysis, and planning can be tailored and extended for settings such as virtual economy simulations, policy debates, or complex team-based competitions.

Insights derived from metrics and module tuning can be applied to optimize the balance between collaboration and confrontation, promoting robust social behavior in high-dimensional multi-agent ecosystems.

6. Technical and Implementation Considerations

  • Modular Pipeline: Clearly separated modules for memory, analysis, planning, and action selection facilitate system scalability, maintenance, and experimentation with alternative reasoning or summarization methods.
  • Probabilistic Action Selection: Actions are sampled from distributions conditional on comprehensive context, supporting both unpredictable (deceptive/adversarial) and goal-rational behavior.
  • Summarization Strategies: Efficient memory summarization (SMR) is crucial for scaling to long-duration games or high-frequency interaction environments.
  • Experience Aggregation: Incorporating both self and opponent trajectories ensures that adaptation is grounded not only in personal achievement but also in opponent modeling for game-theoretic robustness.
  • Integration of External Classifiers: Using services like ChatGPT for labeling or classification of dialogue types leverages existing LLM infrastructure for enhanced evaluation.

Such technical features ensure that the framework can be generalized, scaled, and audited, making it suitable for research and commercial deployment in various MMO scenarios.


In summary, the LLM-empowered MMOAgent framework developed for Avalon gameplay exemplifies a sophisticated, modular architecture capable of simulating both adaptive collaboration and confrontation. The system's emphasis on memory, planning, outcome-driven adaptation, and nuanced social behavior analysis underpins its utility as a blueprint for next-generation agents in virtual society simulations and interactive MMOs (Lan et al., 2023).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)