Cause of degraded action mapping when using latent policy outputs
Ascertain whether distribution mismatch or higher noise in the CONTEXT module’s latent policy head, relative to the latent inverse dynamics head, causes the observed degradation in downstream action mapping performance when mapping per-particle latent actions to global actions in the Latent Particle World Model (LPWM), and characterize the underlying mechanism of this discrepancy.
References
Notably, we empirically found that directly using the latent policy outputs for mapping degrades downstream performance; the mapping network performs best when evaluated on the outputs of the latent inverse module, as this matches the distribution seen during training. The difference may be due to distribution mismatch or higher noise from the latent policy predictor—a question we leave for future investigation.