Retention of VLM Visual–Language Representations After VLA Adaptation
Determine to what extent pretrained Vision–Language Models preserve their original visual–language representations and world knowledge after adaptation to the action modality in Vision–Language–Action models via supervised fine-tuning for robotic control.
Sponsor
References
Yet when these VLMs are adapted to the action modality, it remains unclear to what extent their original VL representations and knowledge are preserved.
— Don't Blind Your VLA: Aligning Visual Representations for OOD Generalization
(2510.25616 - Kachaev et al., 29 Oct 2025) in Section 1: Introduction