Convergence under the deadly triad
Establish general convergence guarantees or characterize necessary and sufficient stability conditions for reinforcement learning algorithms that concurrently employ function approximation, bootstrapping, and off-policy learning (the deadly triad), including deep neural network–based methods.
References
Each is desirable in isolation. Their interaction, known as the deadly triad \citep[Ch.~11]{sutton2018}, is the central open problem in reinforcement learning theory.
— A Survey of Reinforcement Learning For Economics
(2603.08956 - Rawat, 9 Mar 2026) in Section “The Central Challenge: The Deadly Triad”