Best LLM role-play granularity for wargame simulations
Determine whether instructing large language models (such as GPT-3.5 or GPT-4) to simulate a team collectively, to simulate multiple characters jointly, or to role-play as individual agents with nuanced roles yields better results for wargame experiments that compare LLM-simulated decisions against human expert teams in a U.S.–China National Security Council crisis scenario.
References
Also, it is still unclear whether tasking the LLMs to simulate a player team, a combination of characters, or to role-play as individuals with a more nuanced view would yield better results for similar experiments \citep{Shanahan2023}.
— Human vs. Machine: Behavioral Differences Between Expert Humans and Language Models in Wargame Simulations
(2403.03407 - Lamparth et al., 2024) in Section: Discussions, paragraph 2