Dice Question Streamline Icon: https://streamlinehq.com

Effect of differentiating through the opponent-gradient dependence in LOLA

Investigate the effect on learning dynamics and outcomes when Learning with Opponent-Learning Awareness (LOLA) backpropagates through the dependence of the gradient of agent 1’s value with respect to agent 2’s parameters, ∂V1/∂θ2, on agent 1’s parameters θ1 during the backward pass, rather than dropping this dependence as done in the current LOLA derivation.

Information Square Streamline Icon: https://streamlinehq.com

Background

In the derivation of the LOLA update, the objective V1(θ1, θ2 + Δθ2) is optimized with respect to θ1 while anticipating a naive learning step Δθ2 for the opponent. When differentiating this objective, the current formulation drops the dependence of the cross term ∇_{θ2} V1(θ1, θ2) on θ1 during the backward pass, focusing instead on shaping the opponent’s update via Δθ2.

The authors explicitly note that understanding the consequences of including this dropped dependency could change the learning behavior. Determining how retaining and differentiating through this dependence influences convergence, stability, and emergent strategies (e.g., cooperation) remains an unresolved question in the formulation of LOLA.

References

Since LOLA focuses on this shaping of the learning direction of the opponent, the dependency of $\nabla_{\theta2} V1(\theta1, \theta2)$ on $\theta1$ is dropped during the backward pass. Investigation of how differentiating through this term would affect the learning outcomes is left for future work.

Learning with Opponent-Learning Awareness (1709.04326 - Foerster et al., 2017) in Section 3.2 (Learning with Opponent Learning Awareness)