Extending Transitive RL to general reward-based reinforcement learning

Establish whether Transitive RL or other divide-and-conquer-style value learning algorithms can be generalized from goal-conditioned reinforcement learning to general reward-based reinforcement learning tasks.

Background

TRL targets goal-conditioned tasks by exploiting the triangle inequality over temporal distances and demonstrates improved scalability on long-horizon problems. However, many practical reinforcement learning problems are defined by general reward structures rather than goal-reaching objectives.

The authors explicitly identify the extension of TRL—or divide-and-conquer value learning more broadly—to general reward-based RL as an open direction, indicating that additional theory or algorithmic mechanisms may be needed outside the goal-conditioned framework.

References

Another open question is whether TRL (or any divide-and-conquer-style algorithm) can be extended to general reward-based RL tasks, beyond goal-conditioned RL.

— Transitive RL: Value Learning via Divide and Conquer (2510.22512 - Park et al., 26 Oct 2025) in Section 6: What's Next?

Extending Transitive RL to general reward-based reinforcement learning

Sponsor

Background

References

Related Problems