Find Open Problems
Find Open Problems
Search for open problems in our database
Submit a Problem
Submit a new open problem to our database (not available yet)
Dice Question Streamline Icon: https://streamlinehq.com

Generalization from RL training distribution to held-out test sets

Determine how large language models trained via reinforcement learning generalize from the in-distribution training prompts to held-out test sets. Characterize the relationship between in-distribution validation scaling curves and downstream generalization performance, and identify algorithmic factors that govern generalization under multi-epoch RL training.

References

This still leaves the question of how well the LLM would generalize from the training distribution to held out test sets. While a full characterization of generalization is beyond the scope of our work, we do observe correlation between in-distribution validation and downstream generalization performance.

The Art of Scaling Reinforcement Learning Compute for LLMs (Khatri et al., 15 Oct 2025) in Section 6 (Discussion) — Generalization bullet