Unbiased divide-and-conquer value learning in stochastic environments
Determine whether a divide-and-conquer value learning technique, such as Transitive RL’s triangle inequality-based update, can be applied to goal-conditioned reinforcement learning in stochastic environments to learn unbiased value functions.
References
For example, it remains an open question whether a similar divide-and-conquer value learning technique could be applied to learn an unbiased value function in stochastic environments (which is a limitation of TRL, as well as many other works leveraging the vanilla triangle inequality in GCRL).
— Transitive RL: Value Learning via Divide and Conquer
(2510.22512 - Park et al., 26 Oct 2025) in Section 6: What's Next?