Cause of higher early-timestep state-prediction error in image-based locomotion
Investigate whether the higher state-prediction error at early timesteps observed in DeepMind Control Suite locomotion tasks arises from a nearly uniform initial state distribution that increases the likelihood of occlusion-heavy observations, rather than from other factors such as model capacity or parameter sharing.
Sponsor
References
We conjecture that the higher error at early timesteps is due to a nearly uniform distribution for the initial state distribution, where states that induce occlusion may be quite likely.
— To Distill or Decide? Understanding the Algorithmic Trade-off in Partially Observable Reinforcement Learning
(2510.03207 - Song et al., 3 Oct 2025) in Appendix, Section "Misspecification of Decodability in Practice" (app:exp-mis-dec)