Mitigating Gradient Bias in Multi-objective Learning: A Provably Convergent Stochastic Approach (2210.12624v2)
Abstract: Machine learning problems with multiple objective functions appear either in learning with multiple criteria where learning has to make a trade-off between multiple performance metrics such as fairness, safety and accuracy; or, in multi-task learning where multiple tasks are optimized jointly, sharing inductive bias between them. This problems are often tackled by the multi-objective optimization framework. However, existing stochastic multi-objective gradient methods and its variants (e.g., MGDA, PCGrad, CAGrad, etc.) all adopt a biased noisy gradient direction, which leads to degraded empirical performance. To this end, we develop a stochastic Multi-objective gradient Correction (MoCo) method for multi-objective optimization. The unique feature of our method is that it can guarantee convergence without increasing the batch size even in the non-convex setting. Simulations on multi-task supervised and reinforcement learning demonstrate the effectiveness of our method relative to state-of-the-art methods.
- Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Machine Intell., 39(12):2481–2495, 2017.
- Optimization Methods for Large-Scale Machine Learning. SIAM Review, 60(2), 2018.
- Closing the gap: Tighter analysis of alternating stochastic gradient methods for bilevel problems. In Proc. Advances in Neural Info. Process. Syst., virtual, 2021.
- A single-timescale method for stochastic bilevel optimization. In Proc. of International Conference on Artificial Intelligence and Statistics, pages 2466–2488, 2022.
- Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. In Proc. of International Conference on Machine Learning, virtual, July 2018.
- Just pick a sign: Optimizing deep multitask models with gradient sign dropout. In Proc. Advances in Neural Info. Process. Syst., virtual, December 2020.
- The cityscapes dataset. In CVPR Workshop on the Future of Datasets in Vision, Boston, MA, June 2015.
- A fast and elitist multiobjective genetic algorithm: Nsga-ii. IEEE transactions on evolutionary computation, 6(2):182–197, 2002.
- Implicit Functions and Solution Mappings. Springer, 2009.
- Jean-Antoine Désidéri. Multiple-gradient Descent Algorithm (MGDA) for Multi-objective Optimization. Comptes Rendus Mathematique, 350(5-6), 2012.
- Efficiently identifying task groupings for multi-task learning. Advances in Neural Information Processing Systems, 34, 2021.
- Complexity of Gradient Descent for Multi-objective Optimization. Optimization Methods and Software, 34(5):949–959, 2019.
- Approximation Methods for Bi-level Programming. arXiv preprint:1802.02246, 2018.
- Divide-and-conquer reinforcement learning. arXiv preprint:1711.09874, 2017.
- Random hypervolume scalarizations for provable multi-objective black box optimization. arXiv preprint arXiv:2006.04655, 2020.
- Min-max bilevel multi-objective optimization with applications in machine learning. arXiv preprint arXiv:2203.01924, 2022.
- Adversarial reweighting for partial domain adaptation. In Proc. Advances in Neural Info. Process. Syst., virtual, December 2021.
- Dynamic task prioritization for multitask learning. In Proceedings of the European conference on computer vision, Munich, Germany, July 2018.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pages 1861–1870. PMLR, 2018.
- A Joint Many-task Model: Growing a Neural Network for Multiple NLP Tasks. arXiv preprint:1611.01587, 2016.
- Distilling the knowledge in a neural network. arXiv preprint:1503.02531, 2015.
- A two-timescale framework for bilevel optimization: Complexity analysis and application to actor-critic. arXiv preprint arXiv:2007.05170, 2020.
- Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. arXiv preprint:1705.07115, 2017.
- Joshua Knowles. Parego: A hybrid algorithm with on-line landscape approximation for expensive multiobjective optimization problems. IEEE Transactions on Evolutionary Computation, 10(1):50–66, 2006.
- Diversity-guided multi-objective bayesian optimization with batch evaluations. Advances in Neural Information Processing Systems, 33:17708–17720, 2020.
- Actor-critic-type learning algorithms for markov decision processes. SIAM Journal on Control and Optimization, 38(1):94–123, 1999.
- Multiuser optimization: Distributed algorithms and error analysis. SIAM Journal on Optimization, 21(3):1046–1081, 2011.
- Storm+: Fully adaptive sgd with recursive momentum for nonconvex optimization. Advances in Neural Information Processing Systems, 34:20571–20582, 2021.
- Baijiong Lin and Yu Zhang. LibMTL: A Python Library for Multi-Task Learning. arXiv preprint:2203.14338, 2022.
- Pareto multi-task learning. In Proc. Advances in Neural Info. Process. Syst., Vancouver, Canada, December 2019.
- Conflict-Averse Gradient Descent for Multi-task Learning. In Proc. Advances in Neural Info. Process. Syst., virtual, December 2021a.
- Towards impartial multi-task learning. In Proc. of International Conference on Learning Representations, virtual, May 2021b.
- End-to-end multi-task learning with attention. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, June 2019.
- The Stochastic Multi-gradient Algorithm for Multi-objective Ooptimization and its Application to Supervised Machine Learning. Annals of Operations Research, pages 1–30, 2021.
- Profiling Pareto Front With Multi-Objective Stein Variational Gradient Descent. In Proc. Advances in Neural Info. Process. Syst., virtual, December 2021c.
- Exact pareto optimal search for multi-task learning: Touring the pareto front. arXiv preprint:2108.00597, 2021.
- Attentive single-tasking of multiple tasks. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, June 2019.
- The traveling observer model: Multi-task learning through spatial variable embeddings. arXiv preprint arXiv:2010.02354, 2020.
- Cross-stitch networks for multi-task learning. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, June 2016.
- Learning the pareto front with hypernetworks. In Proc. of International Conference on Learning Representations, virtual, April 2020.
- Multi-Task Learning as a Bargaining Game. arXiv preprint:2202.01017, 2022.
- Actor-mimic: Deep multitask and transfer reinforcement learning. arXiv preprint:1511.06342, 2015.
- Making gradient descent optimal for strongly convex stochastic optimization. arXiv preprint arXiv:1109.5647, 2011.
- Routing networks: Adaptive selection of non-linear functions for multi-task learning. arXiv preprint:1711.01239, 2017.
- Sebastian Ruder. An overview of multi-task learning in deep neural networks. arXiv preprint:1706.05098, 2017.
- Policy distillation. arXiv preprint:1511.06295, 2015.
- Adapting visual category models to new domains. In European conference on computer vision, Crete, Greece, September 2010.
- Multi-task learning as multi-objective optimization. In Proc. Advances in Neural Info. Process. Syst., Montreal, Canada, December 2018.
- Multi-objective Optimization Design through Machine Learning for Drop-on-demand Bioprinting. Engineering, 5(3):586–593, 2019.
- Indoor segmentation and support inference from rgbd images. In European conference on computer vision, Firenze, Italy, October 2012.
- Mtrl - multi task rl algorithms. Github, 2021. URL https://github.com/facebookresearch/mtrl.
- Distral: Robust multitask reinforcement learning. Proc. Advances in Neural Info. Process. Syst., December 2017.
- Multi-objective spibb: Seldonian offline policy improvement with safety constraints in finite mdps. In Proc. Advances in Neural Info. Process. Syst., virtual, December 2021.
- Multi-task learning for dense prediction tasks: A survey. IEEE Trans. Pattern Anal. Machine Intell., 2021.
- Deep hashing network for unsupervised domain adaptation. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Hawaii, July 2017.
- Bridging multi-task learning and meta-learning: Towards efficient training and effective adaptation. In International Conference on Machine Learning, pages 10991–11002. PMLR, 2021.
- Characterizing the gap between actor-critic and policy gradient. In Proc. of International Conference on Machine Learning, virtual, July 2021.
- A finite-time analysis of two time-scale actor-critic methods. Advances in Neural Information Processing Systems, 33:17617–17628, 2020.
- Provably faster algorithms for bilevel optimization. Advances in Neural Information Processing Systems, 34:13670–13682, 2021a.
- Multi-task reinforcement learning with soft modularization. In Proc. Advances in Neural Info. Process. Syst., virtual, December 2020.
- Pareto policy pool for model-based offline reinforcement learning. In International Conference on Learning Representations, 2021b.
- Multi-objective meta learning. In Proc. Advances in Neural Info. Process. Syst., virtual, December 2021.
- Development and Application of a Machine Learning Based Multi-objective Optimization Workflow for CO2-EOR Projects. Fuel, 264:116758, 2020.
- Gradient surgery for multi-task learning. In Proc. Advances in Neural Info. Process. Syst., virtual, December 2020a.
- Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In Conference on Robot Learning, Virtual, November 2020b.
- Yu Zhang and Qiang Yang. A survey on multi-task learning. IEEE Trans. Knowledge Data Eng., 2021.