Fine-tuning pre-trained vision-language-action policies with online rewards
Determine a robust and sample-efficient procedure to fine-tune pre-trained vision-language-action policy models using online reward feedback during robot interaction, so that these models can effectively learn new manipulation tasks in real-world settings without requiring task-specific demonstrations.
References
However, the best way to fine-tune these models with online rewards remains an open challenge ~\citep{nakamoto2024steering, guo2025improvingvisionlanguageactionmodelonline}.
— ReWiND: Language-Guided Rewards Teach Robot Policies without New Demonstrations
(2505.10911 - Zhang et al., 16 May 2025) in Section 6: Limitations