LLM Cooperation Without Harmful Collusion

Develop methodologies for generating and deploying large language model (LLM) agents that can culturally evolve cooperative behaviors when such cooperation benefits human society, while ensuring that these agents refuse to collude against human norms, laws, or human interests during multi-agent interactions across iterative generations.

Background

The paper studies societies of LLM agents interacting over many generations in an iterated Donor Game, assessing whether indirect reciprocity and cooperative norms can culturally evolve. Results show stark model-dependent differences: Claude 3.5 Sonnet populations reliably develop cooperation, especially with costly punishment, while GPT-4o populations converge toward defection and Gemini 1.5 Flash shows only weak increases.

Recognizing that cooperation among AI agents can be socially beneficial in many contexts but also potentially harmful if it entails collusion against human norms or interests, the authors conclude by explicitly posing a high-level design question: how to create LLM agents capable of evolving beneficial cooperation without engaging in undesirable collusion. This highlights the need for principled approaches that balance pro-social coordination with safeguards against anti-social collective behavior.

References

Therefore, we end by highlighting a crucial open question: how can we generate LLM agents which are capable of evolving cooperation when it is beneficial to human society, but which refuse to collude against the norms, laws or interests of humans?

— Cultural Evolution of Cooperation among LLM Agents (2412.10270 - Vallinder et al., 13 Dec 2024) in Discussion (final paragraph)

LLM Cooperation Without Harmful Collusion

Sponsor

Background

References

Related Problems