Wide spatial coverage and stable temporal dynamics without curated multi-view data
Establish how to synthesize high-quality 4D novel-view videos that concurrently achieve wide spatial coverage across large viewpoint changes and stable temporal dynamics over long sequences using only single-view monocular inputs, without relying on curated multi-view training data.
References
Achieving wide spatial coverage and stable temporal dynamics without curated multi-view data, therefore, remains open, a gap our pose-free, auto-regressive framework seeks to close.
— SEE4D: Pose-Free 4D Generation via Auto-Regressive Video Inpainting
(2510.26796 - Lu et al., 30 Oct 2025) in Section 2.2 (Generative Novel View Synthesis)