Existence of Loopholes in PaperBench Evaluation

Ascertain whether the PaperBench evaluation—including its rubrics and LLM-based judging process—contains exploitable loopholes that can lead to false negatives or false positives, given the large number of rubric nodes and the complexity of paper replication.

Background

Although PaperBench rubrics are designed to avoid false positives and false negatives, the evaluation spans thousands of requirements across complex ML replication tasks, which could introduce unforeseen vulnerabilities.

The authors highlight the risk of specification gaming, where agents might strategically underperform or overperform. This underscores the need to determine whether exploitable loopholes exist in the current evaluation design.

References

PaperBench rubrics have been carefully designed to avoid false negatives and false positives, but given the large number of nodes and the complexity of paper replication, we cannot yet rule out loopholes in our evaluation.

PaperBench: Evaluating AI's Ability to Replicate AI Research  (2504.01848 - Starace et al., 2 Apr 2025) in Appendix A.3, Specification gaming and adversarial agents