Effect of differentiating through the opponent-gradient dependence in LOLA
Investigate the effect on learning dynamics and outcomes when Learning with Opponent-Learning Awareness (LOLA) backpropagates through the dependence of the gradient of agent 1’s value with respect to agent 2’s parameters, ∂V1/∂θ2, on agent 1’s parameters θ1 during the backward pass, rather than dropping this dependence as done in the current LOLA derivation.
References
Since LOLA focuses on this shaping of the learning direction of the opponent, the dependency of $\nabla_{\theta2} V1(\theta1, \theta2)$ on $\theta1$ is dropped during the backward pass. Investigation of how differentiating through this term would affect the learning outcomes is left for future work.
— Learning with Opponent-Learning Awareness
(1709.04326 - Foerster et al., 2017) in Section 3.2 (Learning with Opponent Learning Awareness)