Methods to mitigate gradient imbalance in multi-task RL post-training

Develop and evaluate training methods that mitigate cross-task gradient magnitude imbalance during reinforcement learning post-training of multi-task large language models, reducing optimization bias toward large-gradient tasks while preserving performance across tasks.

Background

The paper documents harmful effects of gradient imbalance in multi-task RL post-training and shows that naive approaches (e.g., gradient-proportional sampling) do not improve aggregate performance. While the authors call for gradient-level corrections and reconsideration of optimization geometry, they explicitly state that designing concrete mitigation methods remains unclear.

Addressing this problem would enable more balanced and effective multi-task training, preventing domination by tasks with large gradients and improving overall convergence across tasks.

References

Moreover, we only report these observations; how to design new approaches to mitigate them remains unclear and requires further research.

— Imbalanced Gradients in RL Post-Training of Multi-Task LLMs (2510.19178 - Wu et al., 22 Oct 2025) in Section: Limitations

Methods to mitigate gradient imbalance in multi-task RL post-training

Background

References

Related Problems