Revisiting Scalarization in Multi-Task Learning: A Theoretical Perspective (2308.13985v2)
Abstract: Linear scalarization, i.e., combining all loss functions by a weighted sum, has been the default choice in the literature of multi-task learning (MTL) since its inception. In recent years, there is a surge of interest in developing Specialized Multi-Task Optimizers (SMTOs) that treat MTL as a multi-objective optimization problem. However, it remains open whether there is a fundamental advantage of SMTOs over scalarization. In fact, heated debates exist in the community comparing these two types of algorithms, mostly from an empirical perspective. To approach the above question, in this paper, we revisit scalarization from a theoretical perspective. We focus on linear MTL models and study whether scalarization is capable of fully exploring the Pareto front. Our findings reveal that, in contrast to recent works that claimed empirical advantages of scalarization, scalarization is inherently incapable of full exploration, especially for those Pareto optimal solutions that strike the balanced trade-offs between multiple tasks. More concretely, when the model is under-parametrized, we reveal a multi-surface structure of the feasible region and identify necessary and sufficient conditions for full exploration. This leads to the conclusion that scalarization is in general incapable of tracing out the Pareto front. Our theoretical results partially answer the open questions in Xin et al. (2021), and provide a more intuitive explanation on why scalarization fails beyond non-convexity. We additionally perform experiments on a real-world dataset using both scalarization and state-of-the-art SMTOs. The experimental results not only corroborate our theoretical findings, but also unveil the potential of SMTOs in finding balanced solutions, which cannot be achieved by scalarization.
- R. Bhatia. Matrix analysis, volume 169. Springer Science & Business Media, 2013.
- Convex optimization. Cambridge university press, 2004.
- R. Caruana. Multitask learning. Machine learning, 28:41–75, 1997.
- Three-way trade-off in multi-objective learning: Optimization, generalization and conflict-avoidance. arXiv preprint arXiv:2305.20057, 2023.
- Just pick a sign: Optimizing deep multitask models with gradient sign dropout. Advances in Neural Information Processing Systems, 33:2039–2050, 2020.
- P. G. Ciarlet. Linear and nonlinear functional analysis with applications, volume 130. Siam, 2013.
- R. Collobert and J. Weston. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning, pages 160–167, 2008.
- M. Crawshaw. Multi-task learning with deep neural networks: A survey. arXiv preprint arXiv:2009.09796, 2020.
- J.-A. Désidéri. Multiple-gradient descent algorithm (mgda) for multiobjective optimization. Comptes Rendus Mathematique, 350(5-6):313–318, 2012.
- Few-shot learning via learning the representation, provably. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=pW2Q2xLwIMD.
- C. Eckart and G. Young. The approximation of one matrix by another of lower rank. Psychometrika, 1(3):211–218, 1936.
- Mitigating gradient bias in multi-objective learning: A provably convergent approach. In The Eleventh International Conference on Learning Representations, 2022.
- J. Fliege and B. F. Svaiter. Steepest descent methods for multicriteria optimization. Mathematical methods of operations research, 51(3):479–494, 2000.
- The elements of statistical learning: data mining, inference, and prediction, volume 2. Springer, 2009.
- J. M. Holtzman and H. Halkin. Directional convexity and the maximum principle for discrete systems. SIAM Journal on Control, 4(2):263–275, 1966.
- A. Kalai and S. Vempala. Efficient algorithms for online decision problems. Journal of Computer and System Sciences, 71(3):291–307, 2005.
- S. M. Kay. Fundamentals of statistical signal processing: estimation theory. Prentice-Hall, Inc., 1993.
- In defense of the unitary scalarization for deep multi-task learning. Advances in Neural Information Processing Systems, 35:12169–12183, 2022.
- Reasonable effectiveness of random weighting: A litmus test for multi-task learning. Transactions on Machine Learning Research, 2022.
- J. G. Lin. Three methods for determining pareto-optimal solutions of multiple-objective problems. In Directions in Large-Scale Systems: Many-Person Optimization and Decentralized Control, pages 117–138. Springer, 1976.
- Pareto multi-task learning. Advances in neural information processing systems, 32, 2019.
- Controllable pareto multi-task learning. arXiv preprint arXiv:2010.06313, 2020.
- Conflict-averse gradient descent for multi-task learning. Advances in Neural Information Processing Systems, 34:18878–18890, 2021a.
- Towards impartial multi-task learning. In International Conference on Learning Representations, 2021b. URL https://openreview.net/forum?id=IMPnRXEWpvr.
- Profiling pareto front with multi-objective stein variational gradient descent. Advances in Neural Information Processing Systems, 34:14721–14733, 2021c.
- Efficient continuous pareto exploration in multi-task learning. In International Conference on Machine Learning, pages 6522–6531. PMLR, 2020.
- D. Mahapatra and V. Rajan. Multi-task learning with user preferences: Gradient descent with controlled ascent in pareto optimization. In International Conference on Machine Learning, pages 6597–6607. PMLR, 2020.
- A. Maurer. Bounds for linear multi-task learning. The Journal of Machine Learning Research, 7:117–139, 2006.
- The benefit of multitask representation learning. Journal of Machine Learning Research, 17(81):1–32, 2016.
- Cross-stitch networks for multi-task learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3994–4003, 2016.
- A multi-objective/multi-task learning framework induced by pareto stationarity. In International Conference on Machine Learning, pages 15895–15907. PMLR, 2022.
- Learning the pareto front with hypernetworks. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=NjF772F4ZZR.
- Multi-task learning as a bargaining game. In International Conference on Machine Learning, pages 16428–16446. PMLR, 2022.
- R. T. Rockafellar. Convex analysis, volume 11. Princeton university press, 1997.
- M. Ruchte and J. Grabocka. Scalable pareto front approximation for deep multi-objective learning. In 2021 IEEE international conference on data mining (ICDM), pages 1306–1311. IEEE, 2021.
- S. Ruder. An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098, 2017.
- O. Sener and V. Koltun. Multi-task learning as multi-objective optimization. Advances in neural information processing systems, 31, 2018.
- N. Shaked-Monderer and A. Berman. Copositive and completely positive matrices. World Scientific, 2021.
- Provable meta-learning of linear representations. In International Conference on Machine Learning, pages 10434–10443. PMLR, 2021.
- Multi-task learning for dense prediction tasks: A survey. IEEE transactions on pattern analysis and machine intelligence, 2021.
- R. S. Varga. Matrix iterative analysis, volume 27. Springer Science & Business Media, 1999.
- S. Vijayakumar and S. Schaal. Locally weighted projection regression: An o (n) algorithm for incremental real time learning in high dimensional space. In Proceedings of the seventeenth international conference on machine learning (ICML 2000), volume 1, pages 288–293. Morgan Kaufmann, 2000.
- Can small heads help? understanding and improving multi-task generalization. In Proceedings of the ACM Web Conference 2022, pages 3009–3019, 2022.
- Understanding and improving information transfer in multi-task learning. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=SylzhkBtDB.
- Fair and optimal classification via post-processing. In Proceedings of the International Conference on Machine Learning (ICML), 2023.
- Do current multi-task optimization methods in deep learning even help? Advances in Neural Information Processing Systems, 35:13597–13609, 2022.
- M. Ye and Q. Liu. Pareto navigation gradient descent: a first-order algorithm for optimization in pareto set. In Uncertainty in Artificial Intelligence, pages 2246–2255. PMLR, 2022.
- Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2636–2645, 2020a.
- Gradient surgery for multi-task learning. Advances in Neural Information Processing Systems, 33:5824–5836, 2020b.
- L. Zadeh. Optimality and non-scalar-valued performance criteria. IEEE transactions on Automatic Control, 8(1):59–60, 1963.
- Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 64(3):107–115, 2021.
- Y. Zhang and Q. Yang. An overview of multi-task learning. National Science Review, 5(1):30–43, 2018.
- Y. Zhang and Q. Yang. A survey on multi-task learning. IEEE Transactions on Knowledge and Data Engineering, 34(12):5586–5609, 2021.
- H. Zhao and G. J. Gordon. Inherent tradeoffs in learning fair representations. The Journal of Machine Learning Research, 23(1):2527–2552, 2022.
- On the convergence of stochastic multi-objective gradient manipulation and beyond. Advances in Neural Information Processing Systems, 35:38103–38115, 2022.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.