Fundamental open problem in opponent shaping: arrogant behavior persists under consistency
Investigate the fundamental issue in opponent shaping algorithms for differentiable games whereby agents exhibit "arrogant" behavior even when their update functions are consistent under mutual opponent shaping; characterize this phenomenon and determine what, if any, principled modifications resolve it.
References
It was believed that inconsistency leads to arrogant behaviour and lack of preservation of SFPs. We showed that even with consistency, opponent shaping behaves arrogantly, pointing towards a fundamental open problem for the method.
— COLA: Consistent Learning with Opponent-Learning Awareness
(2203.04098 - Willi et al., 2022) in Conclusion and Future Work (Section 7)