Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Mitigating Gradient Bias in Multi-objective Learning: A Provably Convergent Stochastic Approach (2210.12624v2)

Published 23 Oct 2022 in cs.LG, math.OC, and stat.ML

Abstract: Machine learning problems with multiple objective functions appear either in learning with multiple criteria where learning has to make a trade-off between multiple performance metrics such as fairness, safety and accuracy; or, in multi-task learning where multiple tasks are optimized jointly, sharing inductive bias between them. This problems are often tackled by the multi-objective optimization framework. However, existing stochastic multi-objective gradient methods and its variants (e.g., MGDA, PCGrad, CAGrad, etc.) all adopt a biased noisy gradient direction, which leads to degraded empirical performance. To this end, we develop a stochastic Multi-objective gradient Correction (MoCo) method for multi-objective optimization. The unique feature of our method is that it can guarantee convergence without increasing the batch size even in the non-convex setting. Simulations on multi-task supervised and reinforcement learning demonstrate the effectiveness of our method relative to state-of-the-art methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (66)
  1. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Machine Intell., 39(12):2481–2495, 2017.
  2. Optimization Methods for Large-Scale Machine Learning. SIAM Review, 60(2), 2018.
  3. Closing the gap: Tighter analysis of alternating stochastic gradient methods for bilevel problems. In Proc. Advances in Neural Info. Process. Syst., virtual, 2021.
  4. A single-timescale method for stochastic bilevel optimization. In Proc. of International Conference on Artificial Intelligence and Statistics, pages 2466–2488, 2022.
  5. Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. In Proc. of International Conference on Machine Learning, virtual, July 2018.
  6. Just pick a sign: Optimizing deep multitask models with gradient sign dropout. In Proc. Advances in Neural Info. Process. Syst., virtual, December 2020.
  7. The cityscapes dataset. In CVPR Workshop on the Future of Datasets in Vision, Boston, MA, June 2015.
  8. A fast and elitist multiobjective genetic algorithm: Nsga-ii. IEEE transactions on evolutionary computation, 6(2):182–197, 2002.
  9. Implicit Functions and Solution Mappings. Springer, 2009.
  10. Jean-Antoine Désidéri. Multiple-gradient Descent Algorithm (MGDA) for Multi-objective Optimization. Comptes Rendus Mathematique, 350(5-6), 2012.
  11. Efficiently identifying task groupings for multi-task learning. Advances in Neural Information Processing Systems, 34, 2021.
  12. Complexity of Gradient Descent for Multi-objective Optimization. Optimization Methods and Software, 34(5):949–959, 2019.
  13. Approximation Methods for Bi-level Programming. arXiv preprint:1802.02246, 2018.
  14. Divide-and-conquer reinforcement learning. arXiv preprint:1711.09874, 2017.
  15. Random hypervolume scalarizations for provable multi-objective black box optimization. arXiv preprint arXiv:2006.04655, 2020.
  16. Min-max bilevel multi-objective optimization with applications in machine learning. arXiv preprint arXiv:2203.01924, 2022.
  17. Adversarial reweighting for partial domain adaptation. In Proc. Advances in Neural Info. Process. Syst., virtual, December 2021.
  18. Dynamic task prioritization for multitask learning. In Proceedings of the European conference on computer vision, Munich, Germany, July 2018.
  19. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pages 1861–1870. PMLR, 2018.
  20. A Joint Many-task Model: Growing a Neural Network for Multiple NLP Tasks. arXiv preprint:1611.01587, 2016.
  21. Distilling the knowledge in a neural network. arXiv preprint:1503.02531, 2015.
  22. A two-timescale framework for bilevel optimization: Complexity analysis and application to actor-critic. arXiv preprint arXiv:2007.05170, 2020.
  23. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. arXiv preprint:1705.07115, 2017.
  24. Joshua Knowles. Parego: A hybrid algorithm with on-line landscape approximation for expensive multiobjective optimization problems. IEEE Transactions on Evolutionary Computation, 10(1):50–66, 2006.
  25. Diversity-guided multi-objective bayesian optimization with batch evaluations. Advances in Neural Information Processing Systems, 33:17708–17720, 2020.
  26. Actor-critic-type learning algorithms for markov decision processes. SIAM Journal on Control and Optimization, 38(1):94–123, 1999.
  27. Multiuser optimization: Distributed algorithms and error analysis. SIAM Journal on Optimization, 21(3):1046–1081, 2011.
  28. Storm+: Fully adaptive sgd with recursive momentum for nonconvex optimization. Advances in Neural Information Processing Systems, 34:20571–20582, 2021.
  29. Baijiong Lin and Yu Zhang. LibMTL: A Python Library for Multi-Task Learning. arXiv preprint:2203.14338, 2022.
  30. Pareto multi-task learning. In Proc. Advances in Neural Info. Process. Syst., Vancouver, Canada, December 2019.
  31. Conflict-Averse Gradient Descent for Multi-task Learning. In Proc. Advances in Neural Info. Process. Syst., virtual, December 2021a.
  32. Towards impartial multi-task learning. In Proc. of International Conference on Learning Representations, virtual, May 2021b.
  33. End-to-end multi-task learning with attention. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, June 2019.
  34. The Stochastic Multi-gradient Algorithm for Multi-objective Ooptimization and its Application to Supervised Machine Learning. Annals of Operations Research, pages 1–30, 2021.
  35. Profiling Pareto Front With Multi-Objective Stein Variational Gradient Descent. In Proc. Advances in Neural Info. Process. Syst., virtual, December 2021c.
  36. Exact pareto optimal search for multi-task learning: Touring the pareto front. arXiv preprint:2108.00597, 2021.
  37. Attentive single-tasking of multiple tasks. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, June 2019.
  38. The traveling observer model: Multi-task learning through spatial variable embeddings. arXiv preprint arXiv:2010.02354, 2020.
  39. Cross-stitch networks for multi-task learning. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, June 2016.
  40. Learning the pareto front with hypernetworks. In Proc. of International Conference on Learning Representations, virtual, April 2020.
  41. Multi-Task Learning as a Bargaining Game. arXiv preprint:2202.01017, 2022.
  42. Actor-mimic: Deep multitask and transfer reinforcement learning. arXiv preprint:1511.06342, 2015.
  43. Making gradient descent optimal for strongly convex stochastic optimization. arXiv preprint arXiv:1109.5647, 2011.
  44. Routing networks: Adaptive selection of non-linear functions for multi-task learning. arXiv preprint:1711.01239, 2017.
  45. Sebastian Ruder. An overview of multi-task learning in deep neural networks. arXiv preprint:1706.05098, 2017.
  46. Policy distillation. arXiv preprint:1511.06295, 2015.
  47. Adapting visual category models to new domains. In European conference on computer vision, Crete, Greece, September 2010.
  48. Multi-task learning as multi-objective optimization. In Proc. Advances in Neural Info. Process. Syst., Montreal, Canada, December 2018.
  49. Multi-objective Optimization Design through Machine Learning for Drop-on-demand Bioprinting. Engineering, 5(3):586–593, 2019.
  50. Indoor segmentation and support inference from rgbd images. In European conference on computer vision, Firenze, Italy, October 2012.
  51. Mtrl - multi task rl algorithms. Github, 2021. URL https://github.com/facebookresearch/mtrl.
  52. Distral: Robust multitask reinforcement learning. Proc. Advances in Neural Info. Process. Syst., December 2017.
  53. Multi-objective spibb: Seldonian offline policy improvement with safety constraints in finite mdps. In Proc. Advances in Neural Info. Process. Syst., virtual, December 2021.
  54. Multi-task learning for dense prediction tasks: A survey. IEEE Trans. Pattern Anal. Machine Intell., 2021.
  55. Deep hashing network for unsupervised domain adaptation. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Hawaii, July 2017.
  56. Bridging multi-task learning and meta-learning: Towards efficient training and effective adaptation. In International Conference on Machine Learning, pages 10991–11002. PMLR, 2021.
  57. Characterizing the gap between actor-critic and policy gradient. In Proc. of International Conference on Machine Learning, virtual, July 2021.
  58. A finite-time analysis of two time-scale actor-critic methods. Advances in Neural Information Processing Systems, 33:17617–17628, 2020.
  59. Provably faster algorithms for bilevel optimization. Advances in Neural Information Processing Systems, 34:13670–13682, 2021a.
  60. Multi-task reinforcement learning with soft modularization. In Proc. Advances in Neural Info. Process. Syst., virtual, December 2020.
  61. Pareto policy pool for model-based offline reinforcement learning. In International Conference on Learning Representations, 2021b.
  62. Multi-objective meta learning. In Proc. Advances in Neural Info. Process. Syst., virtual, December 2021.
  63. Development and Application of a Machine Learning Based Multi-objective Optimization Workflow for CO2-EOR Projects. Fuel, 264:116758, 2020.
  64. Gradient surgery for multi-task learning. In Proc. Advances in Neural Info. Process. Syst., virtual, December 2020a.
  65. Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In Conference on Robot Learning, Virtual, November 2020b.
  66. Yu Zhang and Qiang Yang. A survey on multi-task learning. IEEE Trans. Knowledge Data Eng., 2021.
Citations (8)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com