Methods for more consistent update functions in the Iterated Prisoner’s Dilemma
Develop learning methods for the Iterated Prisoner’s Dilemma that produce update functions with lower consistency loss—i.e., more consistent solutions under mutual opponent shaping—than those currently demonstrated.
References
For the IPD, COLA's consistency losses are high compared to other games, but much lower than HOLA's consistency losses at high look-ahead rates. We leave it to future work to find methods that obtain more consistent solutions.
— COLA: Consistent Learning with Opponent-Learning Awareness
(2203.04098 - Willi et al., 2022) in Results, Update functions (Section 6)