Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Efficient Pareto Manifold Learning with Low-Rank Structure (2407.20734v1)

Published 30 Jul 2024 in cs.LG

Abstract: Multi-task learning, which optimizes performance across multiple tasks, is inherently a multi-objective optimization problem. Various algorithms are developed to provide discrete trade-off solutions on the Pareto front. Recently, continuous Pareto front approximations using a linear combination of base networks have emerged as a compelling strategy. However, it suffers from scalability issues when the number of tasks is large. To address this issue, we propose a novel approach that integrates a main network with several low-rank matrices to efficiently learn the Pareto manifold. It significantly reduces the number of parameters and facilitates the extraction of shared features. We also introduce orthogonal regularization to further bolster performance. Extensive experimental results demonstrate that the proposed approach outperforms state-of-the-art baselines, especially on datasets with a large number of tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (61)
  1. Low-rank updates of pre-trained weights for multi-task learning. In Findings of the Association for Computational Linguistics, pp.  7544–7554, 2023.
  2. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(12):2481–2495, 2017.
  3. Can we gain more from orthogonality regularizations in training deep networks? In Neural Information Processing Systems, 2018.
  4. Bowman Jr, V. J. On the relationship of the Tchebycheff norm and the efficient frontier of multiple-criteria objectives. In Multiple Criteria Decision Making: Proceedings of a Conference Jouy-en-Josas, pp.  76–86. Springer, 1975.
  5. Convex Optimization. Cambridge University Press, 2004.
  6. Caruana, R. Multitask learning. Machine Learning, 28:41–75, 1997.
  7. Multi-objective deep learning with adaptive reference vectors. In Neural Information Processing Systems, pp.  32723–32735, 2022.
  8. Just pick a sign: Optimizing deep multitask models with gradient sign dropout. In Neural Information Processing Systems, pp.  2039–2050, 2020.
  9. The cityscapes dataset for semantic urban scene understanding. In IEEE Conference on Computer Vision and Pattern Recognition, pp.  3213–3223, 2016.
  10. Désidéri, J.-A. Multiple-gradient descent algorithm (MGDA) for multiobjective optimization. Comptes Rendus Mathematique, 350(5-6):313–318, 2012.
  11. Pareto manifold learning: Tackling multiple tasks via ensembles of single-task models. In International Conference on Machine Learning, pp.  8015–8052, 2023.
  12. You only train once: Loss-conditional training of deep networks. In International Conference on Learning Representations, 2019.
  13. Hypernetworks. Preprint arXiv:1609.09106, 2016.
  14. Haykin, S. Neural Networks: A Comprehensive Foundation. Prentice Hall, 1998.
  15. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition, pp.  770–778, 2016.
  16. Improving Pareto front learning via multi-sample hypernetworks. In AAAI Conference on Artificial Intelligence, pp.  7875–7883, 2023.
  17. LoRA: Low-rank adaptation of large language models. Preprint arXiv:2106.09685, 2021.
  18. Huber, P. J. Robust estimation of a location parameter. In Breakthroughs in Statistics: Methodology and Distribution, pp.  492–518. Springer, 1992.
  19. Editing models with task arithmetic. In International Conference on Learning Representations, 2022.
  20. Rotograd: Gradient homogenization in multitask learning. Preprint arXiv:2103.02631, 2021.
  21. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In IEEE Conference on Computer Vision and Pattern Recognition, pp.  7482–7491, 2018.
  22. Bounded archiving using the Lebesgue measure. In Congress on Evolutionary Computation, pp.  2490–2497, 2003.
  23. Kohavi, R. Scaling up the accuracy of naive-Bayes classifiers: A decision-tree hybrid. In International Conference on Knowledge Discovery and Data Mining, pp.  202–207, 1996.
  24. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.
  25. In defense of the unitary scalarization for deep multi-task learning. In Neural Information Processing Systems, pp.  12169–12183, 2022.
  26. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
  27. Reasonable effectiveness of random weighting: A litmus test for multi-task learning. Transactions on Machine Learning Research, 2022a.
  28. Pareto multi-task learning. In Neural Information Processing Systems, 2019.
  29. Controllable Pareto multi-task learning. Preprint arXiv preprint arXiv:2010.06313, 2020.
  30. Pareto set learning for expensive multi-objective optimization. pp.  19231–19247, 2022b.
  31. Evolutionary Pareto set learning with structure constraints. Preprint arXiv:2310.20426, 2023.
  32. Conflict-averse gradient descent for multi-task learning. In Neural Information Processing Systems, pp.  18878–18890, 2021a.
  33. Towards impartial multi-task learning. In International Conference on Learning Representations, 2020.
  34. End-to-end multi-task learning with attention. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  1871–1880, 2019.
  35. Auto-Lambda: Disentangling dynamic task relationships. Transactions on Machine Learning Research, 2022.
  36. Profiling Pareto front with multi-objective stein variational gradient descent. In Neural Information Processing Systems, pp.  14721–14733, 2021b.
  37. Deep learning face attributes in the wild. In International Conference on Computer Vision, 2015.
  38. Efficient continuous Pareto exploration in multi-task learning. In International Conference on Machine Learning, pp.  6522–6531, 2020.
  39. Multi-task learning with user preferences: Gradient descent with controlled ascent in Pareto optimization. In International Conference on Machine Learning, pp.  6597–6607, 2020.
  40. Miettinen, K. Nonlinear Multiobjective Optimization, volume 12. Springer Science & Business Media, 1999.
  41. Learning the Pareto front with hypernetworks. Preprint arXiv:2010.04104, 2020a.
  42. Learning the Pareto front with hypernetworks. In International Conference on Learning Representations, 2020b.
  43. Multi-task learning as a bargaining game. In International Conference on Machine Learning, pp.  16428–16446, 2022.
  44. Task arithmetic in the tangent space: Improved editing of pre-trained models. Preprint arXiv:2305.12827, 2023.
  45. FiLM: Visual reasoning with a general conditioning layer. In AAAI Conference on Artificial Intelligence, 2018.
  46. Routing networks: Adaptive selection of non-linear functions for multi-task learning. Preprint arXiv:1711.01239, 2017.
  47. Scalable Pareto front approximation for deep multi-objective learning. In IEEE International Conference on Data Mining, pp.  1306–1311, 2021.
  48. Dynamic routing between capsules. In Neural Information Processing Systems, 2017.
  49. Multi-task learning as multi-objective optimization. In Neural Information Processing Systems, 2018.
  50. Very deep convolutional networks for large-scale image recognition. Preprint arXiv:1409.1556, 2014.
  51. Multi-task learning for dense prediction tasks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(7):3614–3633, 2021.
  52. Hypervolume indicator gradient ascent multi-objective optimization. In International Conference on Evolutionary Multi-Criterion Optimization, pp.  654–669, 2017.
  53. Orthogonal convolutional neural networks. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  11505–11515, 2020.
  54. Orthogonal subspace learning for language model continual learning. In Findings of the Association for Computational Linguistics, pp.  10658–10671, 2023a.
  55. MultiLoRA: Democratizing LoRA for better multi-task learning. Preprint arXiv:2311.11501, 2023b.
  56. All you need is beyond a good init: Exploring better solution for training extremely deep convolutional neural networks with orthonormality and modulation. In IEEE Conference on Computer Vision and Pattern Recognition, pp.  6176–6185, 2017.
  57. TIES-merging: Resolving interference when merging models. In Neural Information Processing Systems, 2023.
  58. Pareto navigation gradient descent: A first-order algorithm for optimization in Pareto set. In Uncertainty in Artificial Intelligence, pp.  2246–2255, 2022.
  59. Gradient surgery for multi-task learning. In Neural Information Processing Systems, pp.  5824–5836, 2020.
  60. Age progression/regression by conditional adversarial autoencoder. In IEEE Conference on Computer Vision and Pattern Recognition, pp.  5810–5818, 2017.
  61. Multiobjective optimization using evolutionary algorithms — A comparative case study. In International Conference on Parallel Problem Solving from Nature, pp.  292–301, 1998.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Weiyu Chen (18 papers)
  2. James T. Kwok (65 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets