Identify semantic properties where rubric-based verification surpasses preference learning
Determine which semantic properties can be assessed with rubric-based learned verifiers that achieve inter-annotator agreement K ≥ 0.7, such that specification-based training (e.g., CAPE with rubric-trained verifiers) outperforms preference-based post-training methods (such as RLHF or DPO) for those properties.
Sponsor
References
Open question: For which semantic properties can rubrics achieve sufficient inter-annotator agreement (K > 0.7) to outperform preference learning?
— CAPE: Capability Achievement via Policy Execution
(2512.14761 - Ball, 15 Dec 2025) in Section 9.4 (Generalization Beyond Tested Domains)