DINO_4D: Semantic-Aware 4D Reconstruction

Published 10 Apr 2026 in cs.CV, cs.AI, and cs.RO | (2604.09877v1)

Abstract: In the intersection of computer vision and robotic perception, 4D reconstruction of dynamic scenes serve as the critical bridge connecting low-level geometric sensing with high-level semantic understanding. We present DINO_4D, introducing frozen DINOv3 features as structural priors, injecting semantic awareness into the reconstruction process to effectively suppress semantic drift during dynamic tracking. Experiments on the Point Odyssey and TUM-Dynamics benchmarks demonstrate that our method maintains the linear time complexity $O(T)$ of its predecessors while significantly improving Tracking Accuracy (APD) and Reconstruction Completeness. DINO_4D establishes a new paradigm for constructing 4D World Models that possess both geometric precision and semantic understanding.