Robustness of RoboArena to Adversarial Evaluators

Investigate the robustness of the RoboArena distributed, double-blind, pairwise robot policy evaluation framework to intentionally adversarial evaluators who attempt to tamper with results by providing random preference labels or misleading natural-language feedback, and develop methods to harden the evaluation process against such tampering.

Background

RoboArena is designed as a decentralized real-world evaluation framework where evaluators provide pairwise, double-blind comparisons of generalist robot policies across diverse tasks and environments. The system aggregates preference feedback and qualitative notes to derive global rankings and insights, aiming to be robust and trustworthy through distributed evaluation.

Despite its inherent resilience to individual influence, the framework explicitly acknowledges that its robustness to intentionally adversarial evaluators has not been studied. Adversarial behavior—such as providing random preferences or misleading language feedback—could bias rankings and undermine trust, motivating a focused investigation and mitigation strategies.

References

While RoboArena's distributed, double-blind evaluation scheme gives it an inherent robustness against individual influencing, we have not investigated its robustness to intentionally adversarial evaluators that try to temper with evaluation results, for example by providing random preference ratings or intentionally misleading language feedback. Future work should investigate how distributed robot evaluation approaches can be hardened against such tampering.

— RoboArena: Distributed Real-World Evaluation of Generalist Robot Policies (2506.18123 - Atreya et al., 22 Jun 2025) in Section 6 (Limitations), Adversarial evaluators

Robustness of RoboArena to Adversarial Evaluators

Sponsor

Background

References

Related Problems