Papers
Topics
Authors
Recent
Search
2000 character limit reached

OpenRubrics Architecture

Updated 21 January 2026
  • OpenRubrics is a scalable architecture that synthesizes rubrics via contrastive generation and rejection sampling to enhance LLM alignment.
  • It integrates structured natural language evaluations within both supervised and reinforcement learning frameworks for reliable, multidimensional feedback.
  • Empirical results demonstrate improved throughput and benchmark performance, offering a robust alternative to traditional human annotation methods.

OpenRubrics defines a scalable, synthetic rubric-generation and reward-modeling architecture designed to address key deficiencies in LLM alignment—specifically the limitations of scalar/pairwise judgments and static rubric schemas (Liu et al., 9 Oct 2025). The system is distinguished by its capacity for contrastive rubric generation, preference-label consistency via rejection sampling, and end-to-end integration in supervised and reinforcement learning paradigms. It enables the automatic construction of comprehensive (prompt, rubric) pairs, facilitating interpretable and multidimensional evaluation criteria for reward models, while maintaining high throughput and reliability compared to human annotation. OpenRubrics leverages structured natural language as scaffolding for alignment signals, demonstrating empirically superior performance both for reward models (Rubric-RM) and aligned LLM policies.

1. Dataset Construction Pipeline

OpenRubrics builds on a composite data pipeline sourcing preference and instruction-following samples from UltraFeedback (Evol-Instruct, UltraChat, ShareGPT, TruthfulQA), Tulu 2.5 (AlpacaFarm, Chatbot Arena, SHP, Capybara), HelpSteer 3, Skywork-Preference, MegaScience, and medical datasets. Preference pairs (xi,y^i+,y^i,ix_i, \hat y_i^+, \hat y_i^-, \ell_i) are derived by selecting chosen and rejected responses either by human rating, open-source reward model ranking (e.g., Athene-RM-8B, Skywork-Reward-V2), or programmatic verifiable-IF checks. The data is filtered for triviality (e.g., identical responses, formatting violations), truncated to ≤1024 tokens, and deduplicated by prompt-response fingerprints, resulting in a dataset D={(xi,y^i+,y^i,i)}i=1N\mathcal{D}=\{(x_i, \hat y_i^+, \hat y_i^-, \ell_i)\}_{i=1}^N (Liu et al., 9 Oct 2025).

Source Preference Extraction Method Filtering Capabilities
UltraFeedback Human rating Deduplication/truncation
Tulu 2.5 Reward model ranking Verifiable checks
HelpSteer 3 Reward model ranking Canonicalization

This pipeline establishes the foundational triplet dataset for subsequent rubric synthesis and reward-model training.

2. Contrastive Rubric Generation (CRG) and Rejection Sampling

Contrastive Rubric Generation operationalizes the extraction of both "hard rules" (explicit constraints) and "principles" (implicit qualities) that distinguish chosen from rejected responses. A pretrained instruction-tuned LLM (hψh_\psi) is prompted with (xi,y^i+,y^i,i)(x_i, \hat y_i^+, \hat y_i^-, \ell_i), producing R(xi)={ci,1,...,ci,Ki}\mathcal{R}(x_i) = \{ c_{i,1}, ..., c_{i,K_i} \} which codifies discriminative evaluation criteria. The procedure involves:

  • Extracting non-negotiable hard rules directly from prompt requirements.
  • Abstracting concrete differences between y^i+\hat y_i^+ and y^i\hat y_i^- into principles.
  • Optionally applying contrastive-style margin-based loss:

LCRG=i=1Nj=1Ki[logσ(sψ(ci,j,xi,y^i+)sψ(ci,j,xi,y^i))]\mathcal{L}_{\mathrm{CRG}} = \sum_{i=1}^N\sum_{j=1}^{K_i} \Big[-\log \sigma \big( s_\psi(c_{i,j}, x_i, \hat y_i^+) - s_\psi(c_{i,j}, x_i, \hat y_i^-)\big) \Big]

where sψ(c,x,y)s_\psi(c, x, y) denotes compatibility between criterion and response (Liu et al., 9 Oct 2025).

Label consistency is ensured via rejection sampling: only rubrics yielding correct preference predictions by the generator are retained (Drubric={(xi,y^i+,y^i,R(xi))}\mathcal D_{\mathrm{rubric}}=\{(x_i, \hat y^+_i, \hat y^-_i, \mathcal{R}^*(x_i))\}), directly mitigating label-flip or noise propagation.

3. Rubric-RM Reward Model Architecture

Rubric-RM encapsulates two core modules: the rubric generator (gθg_\theta), and the rubric-conditioned judge (rϕr_\phi), both implemented with Qwen-3 (4B/8B). The generator is supervised-fine-tuned on next-token cross-entropy:

LSFTrubric=E(x,y+,y,R)t=1Rlogpθ(Rtx,y+,y,R<t)\mathcal{L}_{\mathrm{SFT}^{\mathrm{rubric}}} = -\mathbb{E}_{(x, y^+, y^-, \mathcal{R}^*)} \sum_{t=1}^{|\mathcal{R}^*|} \log p_\theta(\mathcal{R}^*_t \mid x, y^+, y^-, \mathcal{R}^*_{<t})

The judge accepts [x;y+;y;R(x)][x; y^+; y^-; \mathcal{R}(x)] and outputs the preference label, similarly trained with cross-entropy over label tokens:

LSFTrm=E(x,y+,y,R,)t=1logpϕ(tx,y+,y,R,<t)\mathcal{L}_{\mathrm{SFT}^{\mathrm{rm}}} = -\mathbb{E}_{(x, y^+, y^-, R^*, \ell)} \sum_{t=1}^{|\ell|} \log p_\phi(\ell_t \mid x, y^+, y^-, R^*, \ell_{<t})

Key configuration parameters for Rubric-RM-8B include batch size 64, learning rate 5×1065 \times 10^{-6}, epochs 2, and max tokens per sample 6144 (Liu et al., 9 Oct 2025).

4. End-to-End Workflow and Integration

The OpenRubrics pipeline proceeds as follows:

  1. Triplet collection: {(x,y^+,y^,)}\{ (x, \hat y^+, \hat y^-, \ell) \}
  2. Application of CRG + rejection sampling yields filtered rubrics R(x)\mathcal{R}^*(x).
  3. Supervised fine-tuning of gθg_\theta on Drubric\mathcal D_{\mathrm{rubric}}.
  4. Supervised fine-tuning of rϕr_\phi on preference labels conditioned on rubrics.
  5. Inference:
    • Generate rubric for new response pair: R^(x)=gθ(x,yA,yB)\hat{\mathcal{R}}(x) = g_\theta(x, y^A, y^B).
    • Compute ^=argmaxk{A,B}pϕ(kx,yA,yB,R^(x))\hat{\ell} = \arg\max_{k \in \{A, B\}} p_\phi(k \mid x, y^A, y^B, \hat{\mathcal{R}}(x)).

Integration enables interpretability, modular rubric updating, and inference-time amortization. This structure generalizes across standard RLHF and principle-driven alignment paradigms (Liu et al., 9 Oct 2025).

5. Scalability, Benchmark Performance, and Policy Transfer

Empirical evaluation demonstrates Rubric-RM’s superiority across multiple reward-modeling benchmarks (RewardBench, RM-Bench, IFBench), with Rubric-RM-4B achieving an average 65.6% accuracy and Rubric-RM-8B reaching 68.5%. Ensemble voting (Rubric-RM-8B-voting@5) achieves 71.2%, closely approximating larger commercial RMs. Policy fine-tuning with DPO shows +3–4 point improvements on instruction-following (IFEval, InfoBench), and best open-source performance (∼ 50–57% wins) on Arena-Hard and AlpacaEval. Biomedical benchmarks (HealthBench) reflect similarly robust gains: Rubric-RM-8B records 68.3% vs. baseline 63.3%; ensemble voting approaches commercial results (72.9%) (Liu et al., 9 Oct 2025).

Amortizable rubric generation substantially reduces wall-clock time per evaluation: Rubric-RM-8B clocks 130 s/100 pairs, outperforming RRM-7B (203 s) and RM-R1-14B (322–382 s).

6. Alignment Signal, Interpretability, and Principle-Driven Reward Modeling

Contrastively generated, consistency-filtered rubrics provide multifaceted, interpretable alignment signals compared to previous scalar or generative reasoning-based reward models. OpenRubrics scaffolds the transition toward principle-driven paradigms, narrowing the gap between costly human evaluation and automated alignment. Structured rubrics not only serve as reward functions but also inform model interpretability and debugging—each rubric is traceable to explicit and implicit response qualities. Rubric synthesis and integration protocols facilitate ongoing rubric refinement and transferability across domains, supporting robust evaluation under reinforcement learning and instruction-following (Liu et al., 9 Oct 2025). A plausible implication is that further scaling or hybridization with dynamic online rubric elicitation (cf. OnlineRubrics (Rezaei et al., 8 Oct 2025)) may yield even more adaptive and resilient alignment frameworks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to OpenRubrics Architecture.