Effect of Scaling Video Foundation Model Training Data on MV-VDP Generalization
Determine whether increasing the amount of training data used to pretrain the Wan2.2 video foundation model that serves as the backbone of MV-VDP improves the generalization performance of MV-VDP on unseen tasks and visual variations.
References
We further conjecture that scaling up the training data for the video foundation model could similarly improve the generalization ability of our approach.
— Multi-View Video Diffusion Policy: A 3D Spatio-Temporal-Aware Video Action Model
(2604.03181 - Li et al., 3 Apr 2026) in Experiments — Real-World Experiments, Results (Generalization to unseen tasks)