Identify semantic properties where rubric-based verification surpasses preference learning

Determine which semantic properties can be assessed with rubric-based learned verifiers that achieve inter-annotator agreement K ≥ 0.7, such that specification-based training (e.g., CAPE with rubric-trained verifiers) outperforms preference-based post-training methods (such as RLHF or DPO) for those properties.

Background

The paper proposes CAPE, a specification-driven post-training protocol that uses symbolic policies for structural properties and rubric-trained learned verifiers for semantic properties. Empirical results show high agreement and strong performance improvements for certain semantic domains (e.g., argument soundness and proof validity), while others (e.g., code correctness) approach but do not meet the same agreement threshold.

To guide the broader applicability of CAPE's learned verifier approach, the authors explicitly pose an open question about the boundary of semantic properties for which rubric-based verification can achieve sufficiently high inter-annotator agreement (K > 0.7) to reliably outperform preference learning, highlighting the need to delineate which capabilities can be effectively handled by rubrics versus those that may still require preference-based methods.

References

Open question: For which semantic properties can rubrics achieve sufficient inter-annotator agreement (K > 0.7) to outperform preference learning?

CAPE: Capability Achievement via Policy Execution  (2512.14761 - Ball, 15 Dec 2025) in Section 9.4 (Generalization Beyond Tested Domains)