Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Welfare Diplomacy: Benchmarking Language Model Cooperation (2310.08901v1)

Published 13 Oct 2023 in cs.MA, cs.AI, and cs.CL

Abstract: The growing capabilities and increasingly widespread deployment of AI systems necessitate robust benchmarks for measuring their cooperative capabilities. Unfortunately, most multi-agent benchmarks are either zero-sum or purely cooperative, providing limited opportunities for such measurements. We introduce a general-sum variant of the zero-sum board game Diplomacy -- called Welfare Diplomacy -- in which players must balance investing in military conquest and domestic welfare. We argue that Welfare Diplomacy facilitates both a clearer assessment of and stronger training incentives for cooperative capabilities. Our contributions are: (1) proposing the Welfare Diplomacy rules and implementing them via an open-source Diplomacy engine; (2) constructing baseline agents using zero-shot prompted LLMs; and (3) conducting experiments where we find that baselines using state-of-the-art models attain high social welfare but are exploitable. Our work aims to promote societal safety by aiding researchers in developing and assessing multi-agent AI systems. Code to evaluate Welfare Diplomacy and reproduce our experiments is available at https://github.com/mukobi/welfare-diplomacy.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Gabriel Mukobi (10 papers)
  2. Hannah Erlebach (1 paper)
  3. Niklas Lauffer (11 papers)
  4. Lewis Hammond (18 papers)
  5. Alan Chan (23 papers)
  6. Jesse Clifton (8 papers)
Citations (14)
Github Logo Streamline Icon: https://streamlinehq.com