Alternative reward computation methods for rubric-based RL in instruction following
Develop and evaluate alternative reward computation methods for rubric-based reinforcement learning in instruction-following tasks, such as weighted sums of per-criterion binary labels v_i produced by the rubric verifier, beyond the current all-or-nothing reward R(prompt, response, rubric) that grants success only when all criteria are satisfied.
References
We leave other reward computation methods (for instance, weighted sum of v_i) for future work.
— Rubric-Based Benchmarking and Reinforcement Learning for Advancing LLM Instruction Following
(2511.10507 - He et al., 13 Nov 2025) in Section “Reward Design and Shaping”