Methods to mitigate gradient imbalance in multi-task RL post-training
Develop and evaluate training methods that mitigate cross-task gradient magnitude imbalance during reinforcement learning post-training of multi-task large language models, reducing optimization bias toward large-gradient tasks while preserving performance across tasks.
References
Moreover, we only report these observations; how to design new approaches to mitigate them remains unclear and requires further research.
— Imbalanced Gradients in RL Post-Training of Multi-Task LLMs
(2510.19178 - Wu et al., 22 Oct 2025) in Section: Limitations