Systematic emergence of recovery behaviors in VLA models

Determine whether recovery behaviors systematically emerge in most deployment settings for generalist Vision-Language-Action (VLA) robot control models, and establish a rigorous evaluation procedure—such as plotting test-time scaling curves of recovery maneuvers—to assess the presence and consistency of this emergence across tasks and environments.

Background

RaC emphasizes training robot policies on interventions that pair recovery with correction, showing that such behaviors lead to robustness and test-time scaling on long-horizon tasks. The authors propose exploring how these ideas relate to generalist Vision-Language-Action (VLA) models, which are trained across diverse tasks and embodiments.

While prior results indicate isolated instances of recovery behavior in VLA models, it is unknown whether this behavior arises consistently and systematically across typical deployment scenarios. Clarifying this would inform the community on the reliability of VLA models in handling out-of-distribution states and compounding errors, and would guide evaluation methodologies that capture test-time scaling via retries and recovery maneuvers.

References

Finally, while prior results do show some examples of recovery behaviors in VLA models, it is unclear if such behaviors systematically emerge in most settings or not, and studying this aspect rigorously (for example, by plotting test-time scaling curves analogous to Figure~\ref{fig:lid_retries_vs_success}) is also useful for the community.

— RaC: Robot Learning for Long-Horizon Tasks by Scaling Recovery and Correction (2509.07953 - Hu et al., 9 Sep 2025) in Section 6: Discussion, Conclusion, and Future Work (Future work paragraph)

Systematic emergence of recovery behaviors in VLA models

Background

References

Related Problems