Mechanism underlying task-dependent differences between sequence and feature action conditioning

Ascertain the underlying mechanism that causes sequence conditioning (actions encoded as tokens concatenated along the sequence dimension with Rotary Position Embeddings) to outperform feature conditioning (actions concatenated along the embedding dimension) on DROID, Robocasa, and Metaworld manipulation tasks, while feature conditioning outperforms sequence conditioning on the Wall 2D navigation task, even when the action-to-visual dimensional ratios are matched in JEPA world model predictors.

Background

The paper studies several predictor action-conditioning schemes within Joint-Embedding Predictive World Models (JEPA-WMs), including feature conditioning, sequence conditioning, and AdaLN conditioning. To isolate the effect of conditioning from capacity differences due to the proportion of action information, the authors conduct equalized action ratio experiments: they downscale the image resolution to increase the relative action proportion for sequence conditioning and match it to feature conditioning, keeping the predictor architecture and RoPE positional encoding identical.

Despite matched action-to-visual dimensional ratios, the experiments show task-dependent performance: sequence conditioning clearly outperforms feature conditioning on DROID, Robocasa, and Metaworld (manipulation-heavy tasks), whereas feature conditioning significantly outperforms sequence conditioning on the Wall (2D navigation) task. Performance is comparable on Maze and Push-T. The authors note that rollout prediction losses are similar, implying that the difference emerges during planning, potentially due to how action information propagates through the predictor (attention-based routing vs. per-token modulation).

References

We cannot provide a precise explanation of the underlying mechanism explaining why we observe such differences.

What Drives Success in Physical Planning with Joint-Embedding Predictive World Models?  (2512.24497 - Terver et al., 30 Dec 2025) in Appendix, Section 6.1 (Additional experiments), paragraph “Equalized action ratio experiments”