Cause of higher early-timestep state-prediction error in image-based locomotion

Investigate whether the higher state-prediction error at early timesteps observed in DeepMind Control Suite locomotion tasks arises from a nearly uniform initial state distribution that increases the likelihood of occlusion-heavy observations, rather than from other factors such as model capacity or parameter sharing.

Background

The authors assess the plausibility of perfect decodability by training state-prediction models from stacked observations and find large errors at initial timesteps across walker-run, humanoid-walk, and dog-walk tasks.

They hypothesize a specific mechanism—nearly uniform initial state distributions leading to occluded or uninformative early observations—but present it as a conjecture rather than a confirmed explanation.

References

We conjecture that the higher error at early timesteps is due to a nearly uniform distribution for the initial state distribution, where states that induce occlusion may be quite likely.

To Distill or Decide? Understanding the Algorithmic Trade-off in Partially Observable Reinforcement Learning (2510.03207 - Song et al., 3 Oct 2025) in Appendix, Section "Misspecification of Decodability in Practice" (app:exp-mis-dec)