Alternative reward computation methods for rubric-based RL in instruction following

Develop and evaluate alternative reward computation methods for rubric-based reinforcement learning in instruction-following tasks, such as weighted sums of per-criterion binary labels v_i produced by the rubric verifier, beyond the current all-or-nothing reward R(prompt, response, rubric) that grants success only when all criteria are satisfied.

Background

RIFL computes a sequence-level reward using rubric verification, with a simple all-or-nothing design that grants reward only when the response satisfies all rubric criteria. The authors conduct preliminary ablations on reward design and observe differences in performance across all-or-nothing, fractional, and hybrid rewards.

They explicitly state that exploring other reward computation methods, including weighted combinations of per-criterion labels, is deferred to future work, indicating an unresolved design space for how best to construct rubric-based rewards for RL training.

References

We leave other reward computation methods (for instance, weighted sum of v_i) for future work.

— Rubric-Based Benchmarking and Reinforcement Learning for Advancing LLM Instruction Following (2511.10507 - He et al., 13 Nov 2025) in Section “Reward Design and Shaping”

Alternative reward computation methods for rubric-based RL in instruction following

Background

References

Related Problems