Convergence under the deadly triad

Establish general convergence guarantees or characterize necessary and sufficient stability conditions for reinforcement learning algorithms that concurrently employ function approximation, bootstrapping, and off-policy learning (the deadly triad), including deep neural network–based methods.

Background

The survey identifies the interaction of three widely used components—function approximation, bootstrapping, and off-policy learning—as a fundamental source of instability in reinforcement learning. While each component is desirable on its own, their combination can lead to divergence, as illustrated by Baird’s counterexample and subsequent analyses.

Although practical remedies exist (e.g., target networks, gradient TD, regularization), a general theoretical characterization that guarantees convergence or delineates precise stability conditions for algorithms combining all three elements remains unsettled. This gap is highlighted as a central open question for reinforcement learning theory, especially in the context of deep function approximation.

References

Each is desirable in isolation. Their interaction, known as the deadly triad \citep[Ch.~11]{sutton2018}, is the central open problem in reinforcement learning theory.

— A Survey of Reinforcement Learning For Economics (2603.08956 - Rawat, 9 Mar 2026) in Section “The Central Challenge: The Deadly Triad”

Convergence under the deadly triad

Background

References

Related Problems