Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 99 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 40 tok/s
GPT-5 High 38 tok/s Pro
GPT-4o 101 tok/s
GPT OSS 120B 470 tok/s Pro
Kimi K2 161 tok/s Pro
2000 character limit reached

Time to Talk: LLM Agents for Asynchronous Group Communication in Mafia Games (2506.05309v1)

Published 5 Jun 2025 in cs.MA, cs.AI, and cs.CL

Abstract: LLMs are used predominantly in synchronous communication, where a human user and a model communicate in alternating turns. In contrast, many real-world settings are inherently asynchronous. For example, in group chats, online team meetings, or social games, there is no inherent notion of turns; therefore, the decision of when to speak forms a crucial part of the participant's decision making. In this work, we develop an adaptive asynchronous LLM-agent which, in addition to determining what to say, also decides when to say it. To evaluate our agent, we collect a unique dataset of online Mafia games, including both human participants, as well as our asynchronous agent. Overall, our agent performs on par with human players, both in game performance, as well as in its ability to blend in with the other human players. Our analysis shows that the agent's behavior in deciding when to speak closely mirrors human patterns, although differences emerge in message content. We release all our data and code to support and encourage further research for more realistic asynchronous communication between LLM agents. This work paves the way for integration of LLMs into realistic human group settings, from assistance in team discussions to educational and professional environments where complex social dynamics must be navigated.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper introduces an LLM agent with a two-module design that strategically decides when to speak and how to craft messages in asynchronous settings.
  • It evaluates the agent using the LLMafia dataset, revealing human-like timing and message rates but distinct differences in message content.
  • The analysis highlights that balanced communication, measured by message quantity and timing, is crucial for successful integration in social game environments.

Asynchronous Communication with LLM Agents in Mafia Games

This paper introduces an LLM-based agent designed for asynchronous multi-party communication, addressing the critical aspect of deciding when to speak, in addition to what to say. The agent is evaluated within the game of Mafia, requiring strategic communication and deception, alongside human players. The authors collected the LLMafia dataset, which includes human-LLM interactions during Mafia games. The agent's performance is analyzed in terms of message timing, quantity, content, and game performance, revealing that the agent mirrors human behavior in timing and quantity, but exhibits differences in message content.

Agent Architecture and Implementation

The core of the agent lies in its two-module design: a scheduler and a generator (Figure 1). The scheduler determines whether to send a message at a given time, while the generator crafts the message content. Both modules leverage LLMs, using the group chat context as input, including conversation setting, participants, and message history. Figure 1

Figure 1: The agent's logic design leverages a scheduler to decide when to speak and a generator to craft the message, balancing talkativeness with strategic silence.

To maintain a balance between being talkative and quiet, the scheduler employs dynamic prompting based on the agent's message rate relative to other participants. If the agent's message rate is lower than 1n\frac{1}{n} (where nn is the number of active participants), the prompt encourages more frequent speaking. Conversely, if the rate exceeds this value, the prompt encourages a more passive, listening role. A typing time simulation is implemented by introducing a delay after each generated message, based on an average typing speed of one word per second, to better align with human behavior [dhakal2018observations].

Experimental Setup: The LLMafia Dataset

The evaluation of the asynchronous agent is conducted within the game of Mafia, a social deduction game that requires players to identify mafia members through discussion and voting. Figure 2

Figure 2: A snapshot of an online Mafia game, where the LLM-agent interacts with human players, simulating real-time decision-making on when to contribute to the conversation.

The game's dynamics necessitate strategic communication, as excessive or insufficient talking can raise suspicion. The rules of the game are displayed in (Figure 3). Figure 3

Figure 3: A flowchart outlining the rules of Mafia, where players must deduce each other's roles through asynchronous communication and strategic voting.

The LLMafia dataset consists of 21 games, totaling 2558 messages, with an average of 121.81 messages per game. The dataset includes messages and votes from all players, along with timestamps and agent-related records, such as prompts. Each game involves 7 to 12 players, including one LLM-agent. Human participants were informed about the presence of an AI agent but were unaware of its identity. Llama3.1-8B-Instruct was used for both the scheduler and the generator. Post-game surveys were conducted to gather human feedback on the agent's behavior, assessing human-similarity, messaging timing, and message relevance.

Analysis of Agent Performance

The analysis focuses on message timing, quantity, content, and game performance to evaluate the LLM-agent's behavior relative to human players. The agent's message timing and quantity are similar to those of humans, with reduced variance (Table 1, Figure 4). However, the agent tends to send longer messages and exhibits a larger vocabulary size (Table 2). Using BGE-M3 embeddings [chen-etal-2024-m3] and LDA classifiers [cohen2013applied], the messages can be easily distinguished by player type, roles, and game phase (Table 3). Despite these differences, the agent achieves similar win rates to human players (Figure 5). Figure 5

Figure 5: A comparison of win percentages between human players and the LLM-agent, highlighting similar performance across different roles within the game.

Notably, being overly talkative correlates with being voted out (Figure 6), reinforcing the importance of blending in with typical human communication patterns. Figure 6

Figure 6: The distribution of speaking ranks for players who were voted out, demonstrating a correlation between excessive talkativeness and elimination from the game.

Human players struggle to detect the agent, with only 59.6% identifying it correctly. The human evaluation scores for similarity to human behavior are mediocre, while the timing of messaging receives a higher score (Table 4). The distribution of time differences between messages is shown in (Figure 7). Figure 7

Figure 7: The distributions of time differences between messages, comparing human and LLM players, indicate similar timing patterns with slightly lower variance for the agent.

This work is contextualized within the broader landscape of multi-agent LLM communication and social AI in games. The authors contrasts their work with synchronous communication paradigms, such as those found in "Werewolf" [Xu2023ExploringLL, Xu2023LanguageAW], "Resistance Avalon" [Light2023FromTT, Wang2023AvalonsGO], and "Dungeons and Dragons" [CallisonBurch2022DungeonsAD], highlighting the novelty of addressing asynchronous communication. The authors also discussed turn-taking models [Leite2013TakeOW, ekstedt-skantze-2020-turngpt, umair-etal-2024-large, 10731379, arora2025talkingturnsbenchmarkingaudio] and multi-agent group discussion simulations [Neuberger2024SAUCESA].

Future Directions

Future research directions include exploring alternative asynchrony strategies, such as generating candidate messages before deciding whether to send them. Additionally, fine-tuning the LLM to output a special "<pass>" token when they choose not to speak could provide a more natural integration of silence. Augmenting existing platforms like TextArena [guertler2025textarena] would enable broader data collection and deeper research into LLM social reasoning and coordination.

Conclusion

This paper demonstrates the feasibility and value of incorporating asynchrony into the communication capabilities of LLMs. The introduced two-stage prompting framework enables agents to participate in asynchronous multi-party communication effectively. The agent's ability to blend into human groups and exhibit human-like speech timing patterns highlights the potential for integrating LLMs in collaborative settings. By modeling asynchrony, LLMs can achieve a richer understanding of human interaction, paving the way for more natural and context-aware participation in group settings.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

X Twitter Logo Streamline Icon: https://streamlinehq.com