Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Inequity aversion improves cooperation in intertemporal social dilemmas (1803.08884v3)

Published 23 Mar 2018 in cs.NE, cs.AI, cs.GT, cs.MA, and q-bio.PE

Abstract: Groups of humans are often able to find ways to cooperate with one another in complex, temporally extended social dilemmas. Models based on behavioral economics are only able to explain this phenomenon for unrealistic stateless matrix games. Recently, multi-agent reinforcement learning has been applied to generalize social dilemma problems to temporally and spatially extended Markov games. However, this has not yet generated an agent that learns to cooperate in social dilemmas as humans do. A key insight is that many, but not all, human individuals have inequity averse social preferences. This promotes a particular resolution of the matrix game social dilemma wherein inequity-averse individuals are personally pro-social and punish defectors. Here we extend this idea to Markov games and show that it promotes cooperation in several types of sequential social dilemma, via a profitable interaction with policy learnability. In particular, we find that inequity aversion improves temporal credit assignment for the important class of intertemporal social dilemmas. These results help explain how large-scale cooperation may emerge and persist.

Inequity Aversion as a Mechanism for Cooperative Behavior in Intertemporal Social Dilemmas

The paper examines the role of inequity aversion in fostering cooperation within intertemporal social dilemmas, expanding upon classical models of social behavior and offering insights into multi-agent dynamics. It introduces a novel framework by incorporating inequity aversion into multi-agent reinforcement learning (MARL), specifically adapting Fehr and Schmidt’s model to Markov games, a more complex and realistic extension for analyzing social dilemmas. This approach is compared with existing commitment to rational choice theory and behavioral economic models.

The research explicitly focuses on two primary behavioral responses to inequity: advantageous inequity aversion, akin to a sense of guilt, and disadvantageous inequity aversion, akin to envy. These are considered within the context of partially observable Markov games involving multiple agents. The authors identify two critical factors affecting cooperation: collective action, necessary to achieve socially optimal equilibria, and temporal credit assignment, which requires agents to associate short-term actions with long-term consequences.

Two distinct games, Cleanup (a public goods dilemma) and Harvest (a commons dilemma), are structured as partially observable Markov games, and serve as testing grounds for the model. In Cleanup, agent teams must manage collective resources to maintain favorable conditions (akin to maintaining a clean aquifer for apple regrowth), while in Harvest, agents balance resource consumption to ensure sustainability. These games are designed to highlight the tensions between individual and collective incentives, characteristic of intertemporal social dilemmas.

The experiments reveal differentiation in the effectiveness of advantageous versus disadvantageous inequity aversion across game types. Results demonstrate that advantageous inequity aversion facilitates cooperation, especially in public goods settings like Cleanup, where agents must contribute actively to a shared resource pool. This form of social preference improves temporal credit assignment by providing intrinsic rewards aligned with long-term group benefits. However, such agents need to constitute a large portion of the population for cooperative dynamics to emerge.

Conversely, disadvantageous inequity aversion is particularly impactful in the Harvest game due to its emphasis on punishing defectors. It effectively transforms the game's payoff structure, discouraging free-riding through timely punishment by aggrieved agents. Interestingly, these agents do not need to dominate the population quantitatively for cooperation to emerge, indicating the potential of this mechanism in environments where enforcement of prosocial norms is feasible.

Despite the promising results, the paper outlines limitations such as potential exploitation of inequity-averse agents and the practical challenge of determining the appropriate population mix in heterogeneous settings. The exploration of evolved intrinsic rewards and hybrid strategies that could provide more robust cooperation mechanisms in stochastic or large-population games are suggested as future avenues of research.

This research contributes importantly by providing empirical evidence and theoretical justification for inequity aversion as a means to enhance cooperation. The findings stress that integrating social preferences into reinforcement learning algorithms could increase the effectiveness of AI in multi-agent interactions, harmonizing individual strategies with group-level success. The work bridges the human behavioral studies and technologically simulated environments, furthering understanding of how complex cooperative behavior could evolve in artificial systems and aligning them closer to human-like negotiations in social dilemmas. The paper thus expands not only the foundational understanding of cooperative mechanism but also proposes a potentially scalable framework for real-world applications in distributed artificial intelligence and beyond.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (12)
  1. Edward Hughes (40 papers)
  2. Joel Z. Leibo (70 papers)
  3. Matthew G. Phillips (1 paper)
  4. Karl Tuyls (58 papers)
  5. Edgar A. Duéñez-Guzmán (14 papers)
  6. Iain Dunning (10 papers)
  7. Tina Zhu (4 papers)
  8. Kevin R. McKee (28 papers)
  9. Heather Roff (2 papers)
  10. Thore Graepel (48 papers)
  11. Antonio García Castañeda (3 papers)
  12. Raphael Koster (11 papers)
Citations (199)