Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multi-Task Learning with Multi-Task Optimization (2403.16162v1)

Published 24 Mar 2024 in cs.AI

Abstract: Multi-task learning solves multiple correlated tasks. However, conflicts may exist between them. In such circumstances, a single solution can rarely optimize all the tasks, leading to performance trade-offs. To arrive at a set of optimized yet well-distributed models that collectively embody different trade-offs in one algorithmic pass, this paper proposes to view Pareto multi-task learning through the lens of multi-task optimization. Multi-task learning is first cast as a multi-objective optimization problem, which is then decomposed into a diverse set of unconstrained scalar-valued subproblems. These subproblems are solved jointly using a novel multi-task gradient descent method, whose uniqueness lies in the iterative transfer of model parameters among the subproblems during the course of optimization. A theorem proving faster convergence through the inclusion of such transfers is presented. We investigate the proposed multi-task learning with multi-task optimization for solving various problem settings including image classification, scene understanding, and multi-target regression. Comprehensive experiments confirm that the proposed method significantly advances the state-of-the-art in discovering sets of Pareto-optimized models. Notably, on the large image dataset we tested on, namely NYUv2, the hypervolume convergence achieved by our method was found to be nearly two times faster than the next-best among the state-of-the-art.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. Y. Zhang and Q. Yang, “A survey on multi-task learning,” IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 12, pp. 5586–5609, 2021.
  2. S. Ruder, “An overview of multi-task learning in deep neural networks,” arXiv preprint arXiv:1706.05098, 2017.
  3. S. Liu, E. Johns, and A. J. Davison, “End-to-end multi-task learning with attention,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 1871–1880.
  4. W. Zhang, R. Li, T. Zeng, Q. Sun, S. Kumar, J. Ye, and S. Ji, “Deep model based transfer and multi-task learning for biological image analysis,” IEEE Transactions on Big Data, vol. 6, no. 2, pp. 322–333, 2016.
  5. R. Collobert and J. Weston, “A unified architecture for natural language processing: Deep neural networks with multitask learning,” in Proceedings of the 25th International Conference on Machine Learning, 2008, pp. 160–167.
  6. A. Kendall, Y. Gal, and R. Cipolla, “Multi-task learning using uncertainty to weigh losses for scene geometry and semantics,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7482–7491.
  7. O. Sener and V. Koltun, “Multi-task learning as multi-objective optimization,” in Advances in Neural Information Processing Systems, 2018, pp. 525–536.
  8. J.-A. Désidéri, “Multiple-gradient descent algorithm (mgda) for multiobjective optimization,” Comptes Rendus Mathematique, vol. 350, no. 5-6, pp. 313–318, 2012.
  9. D. Mahapatra and V. Rajan, “Multi-task learning with user preferences: Gradient descent with controlled ascent in pareto optimization,” in International Conference on Machine Learning.   PMLR, 2020, pp. 6597–6607.
  10. X. Lin, H.-L. Zhen, Z. Li, Q.-F. Zhang, and S. Kwong, “Pareto multi-task learning,” Advances in Neural Information Processing Systems, vol. 32, pp. 12 060–12 070, 2019.
  11. X. Liu, X. Tong, and Q. Liu, “Profiling pareto front with multi-objective stein variational gradient descent,” Advances in Neural Information Processing Systems, vol. 34, pp. 14 721–14 733, 2021.
  12. P. Ma, T. Du, and W. Matusik, “Efficient continuous pareto exploration in multi-task learning,” in International Conference on Machine Learning.   PMLR, 2020, pp. 6522–6531.
  13. A. Navon, A. Shamsian, E. Fetaya, and G. Chechik, “Learning the pareto front with hypernetworks,” in International Conference on Learning Representations, 2021.
  14. X. Lin, Z. Yang, Q. Zhang, and S. Kwong, “Controllable pareto multi-task learning,” arXiv preprint arXiv:2010.06313, 2020.
  15. Q. Zhang and H. Li, “Moea/d: A multiobjective evolutionary algorithm based on decomposition,” IEEE Transactions on Evolutionary Computation, vol. 11, no. 6, pp. 712–731, 2007.
  16. L. Bai, W. Lin, A. Gupta, and Y.-S. Ong, “From multitask gradient descent to gradient-free evolutionary multitasking: A proof of faster convergence,” IEEE Transactions on Cybernetics, 2021.
  17. J. Luo, A. Gupta, Y.-S. Ong, and Z. Wang, “Evolutionary optimization of expensive multiobjective problems with co-sub-pareto front gaussian process surrogates,” IEEE Transactions on Cybernetics, vol. 49, no. 5, pp. 1708–1721, 2018.
  18. K. K. Bali, Y.-S. Ong, A. Gupta, and P. S. Tan, “Multifactorial evolutionary algorithm with online transfer parameter estimation: Mfea-ii,” IEEE Transactions on Evolutionary Computation, vol. 24, no. 1, pp. 69–83, 2019.
  19. K. Swersky, J. Snoek, and R. P. Adams, “Multi-task bayesian optimization,” Advances in Neural Information Processing Systems, vol. 26, 2013.
  20. A. Gupta, Y.-S. Ong, and L. Feng, “Multifactorial evolution: toward evolutionary multitasking,” IEEE Transactions on Evolutionary Computation, vol. 20, no. 3, pp. 343–357, 2015.
  21. ——, “Insights on transfer optimization: Because experience is the best teacher,” IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 2, no. 1, pp. 51–64, 2017.
  22. A. Gupta, L. Zhou, Y.-S. Ong, Z. Chen, and Y. Hou, “Half a dozen real-world applications of evolutionary multitasking, and more,” IEEE Computational Intelligence Magazine, vol. 17, no. 2, pp. 49–66, 2022.
  23. T. Yu, S. Kumar, A. Gupta, S. Levine, K. Hausman, and C. Finn, “Gradient surgery for multi-task learning,” Advances in Neural Information Processing Systems, vol. 33, pp. 5824–5836, 2020.
  24. B. Liu, X. Liu, X. Jin, P. Stone, and Q. Liu, “Conflict-averse gradient descent for multi-task learning,” Advances in Neural Information Processing Systems, vol. 34, 2021.
  25. S. Vandenhende, S. Georgoulis, W. Van Gansbeke, M. Proesmans, D. Dai, and L. Van Gool, “Multi-task learning for dense prediction tasks: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 7, pp. 3614–3633, 2021.
  26. B. Lin, Y. Feiyang, Y. Zhang, and I. Tsang, “Reasonable effectiveness of random weighting: A litmus test for multi-task learning,” Transactions on Machine Learning Research, 2022.
  27. F. Ye, B. Lin, Z. Yue, P. Guo, Q. Xiao, and Y. Zhang, “Multi-objective meta learning,” Advances in Neural Information Processing Systems, vol. 34, pp. 21 338–21 351, 2021.
  28. Z. Chen, V. Badrinarayanan, C.-Y. Lee, and A. Rabinovich, “Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks,” in International Conference on Machine Learning.   PMLR, 2018, pp. 794–803.
  29. R. T. Marler and J. S. Arora, “Survey of multi-objective optimization methods for engineering,” Structural and Multidisciplinary Optimization, vol. 26, no. 6, pp. 369–395, 2004.
  30. K. Deb, “Multi-objective optimization,” in Search methodologies.   Springer, 2014, pp. 403–449.
  31. E. Zitzler and L. Thiele, “Multiobjective evolutionary algorithms: a comparative case study and the strength pareto approach,” IEEE Transactions on Evolutionary Computation, vol. 3, no. 4, pp. 257–271, 1999.
  32. K. Deb, S. Agrawal, A. Pratap, and T. Meyarivan, “A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: Nsga-ii,” in International Conference on Parallel Problem Solving from Nature.   Springer, 2000, pp. 849–858.
  33. Y. Yuan, H. Xu, B. Wang, and X. Yao, “A new dominance relation-based evolutionary algorithm for many-objective optimization,” IEEE Transactions on Evolutionary Computation, vol. 20, no. 1, pp. 16–37, 2015.
  34. J. G. Falcón-Cardona and C. A. C. Coello, “Indicator-based multi-objective evolutionary algorithms: a comprehensive survey,” ACM Computing Surveys (CSUR), vol. 53, no. 2, pp. 1–35, 2020.
  35. A. Gupta and Y.-S. Ong, “Back to the roots: Multi-x evolutionary computation,” Cognitive Computation, vol. 11, no. 1, pp. 1–17, 2019.
  36. M. Ruchte and J. Grabocka, “Scalable pareto front approximation for deep multi-objective learning,” in 2021 IEEE International Conference on Data Mining (ICDM).   IEEE, 2021, pp. 1306–1311.
  37. L. Zhou, L. Feng, J. Zhong, Z. Zhu, B. Da, and Z. Wu, “A study of similarity measure between tasks for multifactorial evolutionary algorithm,” in Proceedings of the Genetic and Evolutionary Computation Conference Companion, 2018, pp. 229–230.
  38. L. Bai, Y.-S. Ong, T. He, and A. Gupta, “Multi-task gradient descent for multi-task learning,” Memetic Computing, vol. 12, no. 4, pp. 355–369, 2020.
  39. M. Lange, D. Zühlke, O. Holz, T. Villmann, and S.-G. Mittweida, “Applications of lp-norms and their smooth approximations for gradient based learning vector quantization.” in ESANN.   Citeseer, 2014, pp. 271–276.
  40. N. Gunantara, “A review of multi-objective optimization: Methods and its applications,” Cogent Engineering, vol. 5, no. 1, p. 1502242, 2018.
  41. E. Zitzler, K. Deb, and L. Thiele, “Comparison of multiobjective evolutionary algorithms: Empirical results,” Evolutionary Computation, vol. 8, no. 2, pp. 173–195, 2000.
  42. S. Sabour, N. Frosst, and G. E. Hinton, “Dynamic routing between capsules,” Advances in Neural Information Processing Systems, vol. 30, 2017.
  43. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
  44. H. Xiao, K. Rasul, and R. Vollgraf, “Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms,” arXiv preprint arXiv:1708.07747, 2017.
  45. Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes in the wild,” in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 3730–3738.
  46. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
  47. E. Spyromitros-Xioufis, G. Tsoumakas, W. Groves, and I. Vlahavas, “Multi-target regression via input space expansion: treating targets as inputs,” Machine Learning, vol. 104, no. 1, pp. 55–98, 2016.
  48. N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor segmentation and support inference from rgbd images,” in European Conference on Computer Vision.   Springer, 2012, pp. 746–760.
  49. C. Couprie, C. Farabet, L. Najman, and Y. Lecun, “Indoor semantic segmentation using depth information,” in First International Conference on Learning Representations (ICLR 2013), 2013, pp. 1–8.
  50. D. Eigen and R. Fergus, “Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture,” in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 2650–2658.
  51. V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 12, pp. 2481–2495, 2017.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Lu Bai (50 papers)
  2. Abhishek Gupta (226 papers)
  3. Yew-Soon Ong (105 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.