Persistence of gradient imbalance beyond RL post-training
Determine whether cross-task gradient magnitude imbalance persists during the pre-training and supervised fine-tuning phases of large language models, analogous to the imbalance observed during reinforcement learning post-training in multi-task settings.
References
It remains unclear whether gradient imbalance also persists in other phases, such as pre-training and supervised fine-tuning.
— Imbalanced Gradients in RL Post-Training of Multi-Task LLMs
(2510.19178 - Wu et al., 22 Oct 2025) in Section: Limitations