Is trajectory diversity the core driver of effective VLA pre-training?
Establish whether trajectory diversity, arising from multi-skill compositions and articulated manipulation within the InternData-A1 dataset, is the primary factor driving effective pre-training performance of Vision-Language-Action models, as opposed to alternative dataset composition factors such as the prevalence of pick-and-place tasks, base tasks, or long-horizon tasks.
References
At a higher level, combining these two findings, we hypothesize that trajectory diversity may serve as the core drive of effective pre-training. We leave a rigorous investigation for future research.
— InternData-A1: Pioneering High-Fidelity Synthetic Data for Pre-training Generalist Policy
(2511.16651 - Tian et al., 20 Nov 2025) in Section 6: Data Analysis, Data Component Ablation