Extending RLVR to Open-ended Tasks Without Verifiers

Establish effective strategies to extend Reinforcement Learning from Verifiable Rewards (RLVR) to open-ended natural language tasks, such as creative writing and brainstorming, that lack objective ground-truth verifiers, enabling reliable learning despite the absence of verifiable rewards.

Background

Reinforcement Learning from Verifiable Rewards (RLVR) has shown strong success in domains like mathematics and coding where objective verifiers exist, enabling clear, automatically checkable reward signals.

For open-ended tasks (e.g., creative writing, brainstorming), there is no objective ground truth or easy verification mechanism, making it difficult to create reliable reward signals. The paper highlights this as an unresolved challenge and motivates rubric-based judges as a step toward addressing it, while noting the broader problem of achieving RLVR-like reliability in non-verifiable domains remains open.

References

Extending this success to open-ended tasks (e.g., creative writing, brainstorming) remains an open challenge due to the lack of ground-truth verifiers~\citep{zhang2025auditable, simonds2025self}.

Rethinking Rubric Generation for Improving LLM Judge and Reward Modeling for Open-ended Tasks  (2602.05125 - Shen et al., 4 Feb 2026) in Related Works, Reinforcement Fine-Tuning (RFT) for Open-Ended Tasks