Scaling laws and embodiment-variability interaction in VLA models
Determine the scaling laws that govern the performance and generalization of Vision-Language-Action (VLA) models as model size, data diversity, and data volume increase, and ascertain how embodiment-specific variability in hardware configurations interacts with model capacity during large-scale training and deployment.
References
Such extensions also raise open questions about the scaling laws of VLA models and how embodiment-specific variability interacts with model capacity.
— X-VLA: Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model
(2510.10274 - Zheng et al., 11 Oct 2025) in Appendix, Section "Limitations and future works"