Whether and how RLRR alleviates reward over-optimization
Determine whether and how reinforcement learning from rubrics-based reward (RLRR) mitigates reward over-optimization in large language model post-training, where reward over-optimization refers to a policy exploiting misspecified proxy rewards to achieve high scores while true output quality deteriorates.
References
However, it's still unclear if, and how, RLRR alleviates reward over-optimization.
— Chasing the Tail: Effective Rubric-based Reward Modeling for Large Language Model Post-Training
(2509.21500 - Zhang et al., 25 Sep 2025) in Section 2 (Preliminaries), paragraph "Reinforcement learning from rubrics-based reward"