Underlying causes of cross-task gradient imbalance
Identify and explain the underlying mechanisms that cause cross-task gradient magnitude imbalances during reinforcement learning post-training of multi-task large language models, beyond correlations with standard training statistics such as rewards, advantages, and sequence lengths.
References
Although such imbalances appear to be from the inherent differences between tasks, we cannot yet explain why it arises, despite studying a broad range of training statistics.
— Imbalanced Gradients in RL Post-Training of Multi-Task LLMs
(2510.19178 - Wu et al., 22 Oct 2025) in Section: Limitations