Matching reconstruction with decoder-free prediction in high-fidelity visual tasks

Determine whether decoder-free, prediction-based representation learning objectives for world models can match the performance of reconstruction-based pixel-decoder objectives on high-fidelity visual tasks where fine-grained visual detail is critical.

Background

NE-Dreamer replaces pixel-level reconstruction with next-embedding prediction using a causal temporal transformer and a redundancy-reduction alignment loss, demonstrating strong gains in partially observable, memory-intensive environments and parity on standard continuous control benchmarks.

The experiments primarily target domains where long-term structure matters more than fine visual detail, leaving open whether decoder-free, prediction-based objectives can achieve comparable effectiveness to reconstruction-based objectives in high-fidelity settings that demand precise visual fidelity.

References

Whether decoder-free, prediction-based objectives can match reconstruction in high-fidelity tasks remains open.

Next Embedding Prediction Makes World Models Stronger  (2603.02765 - Bredis et al., 3 Mar 2026) in Discussion