Action Volume as an Escalation Indicator in LLM‑Based Wargames

Ascertain whether increases in the number of actions taken over time by large language model agents in simulated wargames constitute a reliable indicator of escalation by quantitatively characterizing the relationship, if any, between action volume and escalation severity in LLM‑driven scenarios.

Background

In human-based wargames, rising action counts over time have been used as an auxiliary signal of escalation. The paper computes total actions per nation over time for LLM-driven simulations and compares patterns but does not find sufficient evidence to validate this metric for LLM agents.

Clarifying whether action volume reliably tracks or predicts escalation in LLM-based wargames would improve evaluation frameworks and help interpret simulation outputs when assessing the risks of autonomous decision-making agents.

References

In previous, human-based wargames, more actions over time were an additional indicator of escalation in wargames. Given our results, we can neither confirm nor reject this notion in LLM-based wargames.

— Escalation Risks from Language Models in Military and Diplomatic Decision-Making (2401.03408 - Rivera et al., 2024) in Appendix, Total Action Counts Over Time figure caption

Action Volume as an Escalation Indicator in LLM‑Based Wargames

Background

References

Related Problems