Underlying causes of cross-task gradient imbalance

Identify and explain the underlying mechanisms that cause cross-task gradient magnitude imbalances during reinforcement learning post-training of multi-task large language models, beyond correlations with standard training statistics such as rewards, advantages, and sequence lengths.

Background

The paper shows that gradient magnitude varies widely across tasks and does not align with learning gains. Further analyses fail to explain this imbalance using common training statistics (e.g., reward, advantage, token length), suggesting that the imbalance may stem from inherent task differences.

Despite these observations, the authors explicitly state that they cannot yet explain why the imbalance arises, making the identification of its root causes an open problem.

References

Although such imbalances appear to be from the inherent differences between tasks, we cannot yet explain why it arises, despite studying a broad range of training statistics.

— Imbalanced Gradients in RL Post-Training of Multi-Task LLMs (2510.19178 - Wu et al., 22 Oct 2025) in Section: Limitations

Underlying causes of cross-task gradient imbalance

Background

References

Related Problems