Impact of rubric-judge scoring scales on training

Evaluate alternative scoring scales for the rubric-based reward judge used in RLER, and determine how different scales affect reinforcement learning stability and downstream performance compared to the current 0–2 scale normalization.

Background

The current implementation uses a 0–2 scoring scale from the judge and normalizes by dividing by two. The authors did not explore other scales within this work.

Understanding the effect of score granularity and range on optimization dynamics could improve reward sensitivity, variance, and overall training outcomes in long-form deep research tasks.

References

We leave exploring different scoring scales to future work.

— DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research (2511.19399 - Shao et al., 24 Nov 2025) in Appendix, Subsection "Rubric Reward Judge Prompt"

Impact of rubric-judge scoring scales on training

Background

References

Related Problems