Extend TD representation-learning guarantees from single-policy to multi-policy settings
Determine whether temporal-difference-based representation learning methods that recover low-rank decompositions of the successor measure in single-policy settings extend to multiple policies, establishing theoretical guarantees that hold across a family of policies rather than a single policy.
References
These works crucially rely on having a single policy, and it remains an open questions whether such results extend to multiple policies. In this direction, we provide a first result which connects latent-predictive TD learning with TD learning over the successor measure for multiple polies.
— TD-JEPA: Latent-predictive Representations for Zero-Shot Reinforcement Learning
(2510.00739 - Bagatella et al., 1 Oct 2025) in Appendix: Extended Related Work, Theory of latent-predictive representations