Design reliable rubric generator and verifier for large-scale instruction-following training
Design a reliable rubric generator and a reliable rubric verifier for large-scale reinforcement learning pipelines aimed at improving large language models’ instruction-following capabilities, where the generator synthesizes rubrics for each user prompt from raw training data and the verifier determines whether a given model response satisfies each rubric criterion, so that dependable rubrics and judgments can be provided for training when human labeling is impractical.
References
How to design a good generator and verifier to provide reliable rubrics and judgments for training is still an open problem.
— Rubric-Based Benchmarking and Reinforcement Learning for Advancing LLM Instruction Following
(2511.10507 - He et al., 13 Nov 2025) in Introduction, second bullet under “However, developing a scalable learning pipeline for advanced IF still faces several challenges”