Stabilizing long-horizon training and allocating SFT vs RL steps for optimal performance
Determine principled strategies to stabilize long-horizon training for data-analytic agents and to optimally allocate training steps between supervised fine-tuning and reinforcement learning in order to achieve maximal performance.
References
Yet, in a new scenario, it remains unclear how to stabilize long-horizon agent training and how to allocate training steps across SFT and RL to achieve optimal performance.
                — Scaling Generalist Data-Analytic Agents
                
                (2509.25084 - Qiao et al., 29 Sep 2025) in Section 1 Introduction, Challenges (2) Improper training strategy