Distinguish whether VLA task successes reflect competence or chance
Determine whether successful task executions by Visual Language Action models in robotic manipulation benchmarks (e.g., VLATest on SimplerEnv scenarios) are attributable to learned model competence rather than stochastic chance by establishing execution-level criteria or diagnostics that can make this distinction on a per-run basis.
References
Moreover, it was often unclear whether the task was completed successfully due to model competence or merely by chance.
— Evaluating Uncertainty and Quality of Visual Language Action-enabled Robots
(2507.17049 - Valle et al., 22 Jul 2025) in Introduction (Section 1)