Persistence of gradient imbalance beyond RL post-training

Determine whether cross-task gradient magnitude imbalance persists during the pre-training and supervised fine-tuning phases of large language models, analogous to the imbalance observed during reinforcement learning post-training in multi-task settings.

Background

The paper presents empirical evidence that, during reinforcement learning (RL) post-training of multi-task LLMs, certain tasks produce significantly larger gradients that bias optimization and do not correspond to greater learning gains. While the phenomenon is characterized in RL post-training, the authors explicitly note uncertainty about whether similar imbalances occur in other training phases.

Clarifying whether gradient imbalance also arises during pre-training and supervised fine-tuning would establish the scope of the issue and inform whether mitigation strategies need to be applied across the full training lifecycle of LLMs.

References

It remains unclear whether gradient imbalance also persists in other phases, such as pre-training and supervised fine-tuning.

— Imbalanced Gradients in RL Post-Training of Multi-Task LLMs (2510.19178 - Wu et al., 22 Oct 2025) in Section: Limitations

Persistence of gradient imbalance beyond RL post-training

Sponsor

Background

References

Related Problems