Reliable real-world deployment of large-scale VLA policies

Establish reliable methods for deploying large-scale vision-language-action policies, including OpenVLA, pi_0, MolmoAct, and Gemini Robotics, in real-world robotic settings.

Background

The paper notes that while large-scale vision-language-action (VLA) policies such as OpenVLA, pi_0, MolmoAct, and Gemini Robotics have demonstrated strong language-conditioned manipulation capabilities across multiple embodiments, consistently reliable deployment outside lab settings has not been achieved.

This open problem motivates the work’s focus on scalable, generalizable reward modeling—specifically, leveraging zero-shot progress estimation from pretrained video VLMs—to enable reinforcement learning and policy improvement without brittle, hand-crafted rewards.

References

Large-scale vision-language-action policies such as OpenVLA, pi_0, MolmoAct and Gemini Robotics have demonstrated strong language-conditioned manipulation capabilities across diverse embodiments, yet reliably deploying them in real-world settings remains an open problem.

TOPReward: Token Probabilities as Hidden Zero-Shot Rewards for Robotics  (2602.19313 - Chen et al., 22 Feb 2026) in Related Work — The reward bottleneck for VLA.