Designing high-quality RL training data for long-context reasoning
Determine effective principles and methodologies for designing high-quality reinforcement learning training data tailored to long-context reasoning in large language models, specifying the properties such data must have to reliably elicit advanced reasoning behaviors and support robust evaluation.
Sponsor
References
However, it leaves open key questions about how to design high-quality RL training data.
— LoongRL:Reinforcement Learning for Advanced Reasoning over Long Contexts
(2510.19363 - Wang et al., 22 Oct 2025) in Related Works, subsection “Reasoning and Long-Context Reasoning”