Training methods for long-form deep research models
Determine effective procedures to train language models directly for long-form deep research tasks, including how to structure training objectives, supervision, and reinforcement learning signals so that models reliably learn to produce high-quality, evidence-grounded long-form answers.
References
Many open questions remain around how best to train models directly on long-form tasks; to facilitate future research on this topic, we release all of our data, models, and code, including an MCP-based deep research library and evaluation suite with asynchronous tool-calling support.
— DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research
(2511.19399 - Shao et al., 24 Nov 2025) in Discussion and Future Work