Robustness of general-purpose VLMs as robot reward models
Determine whether state-of-the-art vision-language models pretrained on large, diverse internet-scale datasets can robustly provide accurate and reliable rewards for real-world robotic reinforcement learning at the level of precision required for effective policy training.
Sponsor
References
While VLMs are pretrained on large datasets drawn from a diverse set of sources-endowing them with general vision-language abilities-it is not clear that these general abilities enable them, at present, to robustly provide rewards at the level of precision and reliability required by RL training.
— RoboReward: General-Purpose Vision-Language Reward Models for Robotics
(2601.00675 - Lee et al., 2 Jan 2026) in Introduction