Extend TD representation-learning guarantees from single-policy to multi-policy settings

Determine whether temporal-difference-based representation learning methods that recover low-rank decompositions of the successor measure in single-policy settings extend to multiple policies, establishing theoretical guarantees that hold across a family of policies rather than a single policy.

Background

Prior analyses of temporal-difference representation learning have shown that, under certain parameterizations and assumptions, TD-based objectives can recover low-rank decompositions of the successor measure, but these results are restricted to a single fixed policy.

This paper studies multi-policy settings and provides an initial connection between latent-predictive TD learning and TD learning over the successor measure for multiple policies, but the general extension of single-policy theoretical guarantees to the multi-policy case remains unresolved.

References

These works crucially rely on having a single policy, and it remains an open questions whether such results extend to multiple policies. In this direction, we provide a first result which connects latent-predictive TD learning with TD learning over the successor measure for multiple polies.

TD-JEPA: Latent-predictive Representations for Zero-Shot Reinforcement Learning  (2510.00739 - Bagatella et al., 1 Oct 2025) in Appendix: Extended Related Work, Theory of latent-predictive representations