Methods for more consistent update functions in the Iterated Prisoner’s Dilemma

Develop learning methods for the Iterated Prisoner’s Dilemma that produce update functions with lower consistency loss—i.e., more consistent solutions under mutual opponent shaping—than those currently demonstrated.

Background

Consistency is formalized via equations that update functions must satisfy under mutual opponent shaping. The paper measures deviation from these equations using a consistency loss. While COLA improves over HOLA in the Iterated Prisoner’s Dilemma (IPD), its learned update functions still exhibit relatively high consistency loss compared to other games.

Reducing this loss in the IPD is identified as a concrete target for future work, indicating the need for new or refined methods that better satisfy the consistency conditions in this setting.

References

For the IPD, COLA's consistency losses are high compared to other games, but much lower than HOLA's consistency losses at high look-ahead rates. We leave it to future work to find methods that obtain more consistent solutions.

— COLA: Consistent Learning with Opponent-Learning Awareness (2203.04098 - Willi et al., 2022) in Results, Update functions (Section 6)

Methods for more consistent update functions in the Iterated Prisoner’s Dilemma

Background

References

Related Problems