Stabilizing mixed IFT–reasoning training and reliably exploiting its benefits
Develop training strategies that stabilize mixed-style supervised fine-tuning, in which Instruction Fine-Tuning (IFT) and explicit reasoning (chain-of-thought) instances are combined from the start (the mixed training regime), so that models avoid instability and abrupt mode-switching and can consistently realize the potential performance benefits of mixed-style training across tasks and reasoning ratios.
References
In consequence, we focus on the sequential setting for the remainder of this study, leaving stabilization of mixed-style training and consistent exploitation of its potential benefits to future work.
— When Does Reasoning Matter? A Controlled Study of Reasoning's Contribution to Model Performance
(2509.22193 - Boizard et al., 26 Sep 2025) in Section 3.2, Impact of Mixing IFT and Reasoning Data