Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Equitable Multi-task Learning (2306.09373v2)

Published 15 Jun 2023 in cs.LG and cs.AI

Abstract: Multi-task learning (MTL) has achieved great success in various research domains, such as CV, NLP and IR etc. Due to the complex and competing task correlation, naive training all tasks may lead to inequitable learning, i.e. some tasks are learned well while others are overlooked. Multi-task optimization (MTO) aims to improve all tasks at same time, but conventional methods often perform poor when tasks with large loss scale or gradient norm magnitude difference. To solve the issue, we in-depth investigate the equity problem for MTL and find that regularizing relative contribution of different tasks (i.e. value of task-specific loss divides its raw gradient norm) in updating shared parameter can improve generalization performance of MTL. Based on our theoretical analysis, we propose a novel multi-task optimization method, named EMTL, to achieve equitable MTL. Specifically, we efficiently add variance regularization to make different tasks' relative contribution closer. Extensive experiments have been conduct to evaluate EMTL, our method stably outperforms state-of-the-art methods on the public benchmark datasets of two different research domains. Furthermore, offline and online A/B test on multi-task recommendation are conducted too. EMTL improves multi-task recommendation significantly, demonstrating the superiority and practicability of our method in industrial landscape.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 12 (2017), 2481–2495.
  2. Robust Solutions of Optimization Problems Affected by Uncertain Probabilities. Manag. Sci. 59, 2 (2013), 341–357.
  3. Robust sample average approximation. Math. Program. 171, 1-2 (2018), 217–282.
  4. Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. ICML (2018).
  5. Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign Dropout. In NeurIPS.
  6. Jean-Antoine Désidéri. 2012. Multiple-gradient descent algorithm (MGDA) for multiobjective optimization. Comptes Rendus Mathematique 350 (2012), 313–318.
  7. DeepFM: A Factorization-Machine based Neural Network for CTR Prediction. In IJCAI. 1725–1731.
  8. Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics. In CVPR. 7482–7491.
  9. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
  10. A Closer Look at Loss Weighting in Multi-Task Learning. arXiv preprint arXiv:2111.10603 (2021).
  11. Conflict-Averse Gradient Descent for Multi-task learning. In NeurIPS. 18878–18890.
  12. Towards Impartial Multi-task Learning. In ICLR.
  13. End-To-End Multi-Task Learning With Attention. In CVPR. 1871–1880.
  14. End-to-end multi-task learning with attention. CVPR (2019).
  15. Pareto Domain Adaptation. In NeurIPS. 12917–12929.
  16. Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts. In SIGKDD. 1930–1939.
  17. BanditMTL: Bandit-based Multi-task Learning for Text Classification. In ACL. 5506–5516.
  18. Task Variance Regularized Multi-Task Learning. IEEE Transactions on Knowledge and Data Engineering (2022), 1–14.
  19. Multi-Task Learning as a Bargaining Game. In ICML. 16428–16446.
  20. On First-Order Meta-Learning Algorithms. arXiv preprint arXiv:1803.02999 (2018).
  21. Caruana Rich. 1997. Multitask Learning. Machine Learning 28, 1 (1997), 41–75.
  22. Sebastian Ruder. 2017. An Overview of Multi-Task Learning in Deep Neural Networks. arXiv:1706.05098
  23. Ozan Sener and Vladlen Koltun. 2018. Multi-Task Learning as Multi-Objective Optimization. In NeurIPS. 525–536.
  24. Indoor Segmentation and Support Inference from RGBD Images. In ECCV, Vol. 7576. 746–760.
  25. Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations. In RecSys. 269–278.
  26. Deep and Cross Network for Ad Click Predictions. arXiv:1708.05123
  27. AdaTask: A Task-aware Adaptive Learning Rate Approach to Multi-task Learning. (2023).
  28. Gradient Surgery for Multi-Task Learning. In NeurIPS.
  29. Yu Zhang and Qiang Yang. 2017. A Survey on Multi-Task Learning. arXiv preprint 1707.08114 (2017).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Jun Yuan (54 papers)
  2. Rui Zhang (1138 papers)
Citations (1)