Training a fully native VLM entirely from scratch
Develop a training procedure to train a fully native vision-language model such as NEO entirely from scratch without initialization from any pretrained large language model.
Sponsor
References
Constrained by current text corpus and computational resources, we are unable to train a fully native model entirely from scratch without initialization from an existing LLM. This limitation also hinders our ability to mitigate potential biases arising from the dominance of the language modality.
— From Pixels to Words -- Towards Native Vision-Language Primitives at Scale
(2510.14979 - Diao et al., 16 Oct 2025) in Appendix, Subsection "Limitation and Discussion"