Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multi-Task Learning as Multi-Objective Optimization (1810.04650v2)

Published 10 Oct 2018 in cs.LG and stat.ML

Abstract: In multi-task learning, multiple tasks are solved jointly, sharing inductive bias between them. Multi-task learning is inherently a multi-objective problem because different tasks may conflict, necessitating a trade-off. A common compromise is to optimize a proxy objective that minimizes a weighted linear combination of per-task losses. However, this workaround is only valid when the tasks do not compete, which is rarely the case. In this paper, we explicitly cast multi-task learning as multi-objective optimization, with the overall objective of finding a Pareto optimal solution. To this end, we use algorithms developed in the gradient-based multi-objective optimization literature. These algorithms are not directly applicable to large-scale learning problems since they scale poorly with the dimensionality of the gradients and the number of tasks. We therefore propose an upper bound for the multi-objective loss and show that it can be optimized efficiently. We further prove that optimizing this upper bound yields a Pareto optimal solution under realistic assumptions. We apply our method to a variety of multi-task deep learning problems including digit classification, scene understanding (joint semantic segmentation, instance segmentation, and depth estimation), and multi-label classification. Our method produces higher-performing models than recent multi-task learning formulations or per-task training.

Multi-Task Learning as Multi-Objective Optimization

Abstract

The paper discusses an innovative approach to multi-task learning (MTL) by framing it as a multi-objective optimization problem. Traditional methods in MTL often rely on optimizing a proxy objective by minimizing a weighted sum of task-specific losses. However, these approaches constrain tasks to non-competing objectives, which is a rare real-world scenario. The authors propose using gradient-based algorithms from the multi-objective optimization literature to find Pareto optimal solutions, addressing the conflicts between tasks inherently present in MTL. Their contribution includes an efficient algorithm to optimize an upper bound of the multi-objective loss, achieving scalable performance even with large-scale learning challenges. Empirical evaluations on datasets for digit classification, scene understanding, and multi-label classification demonstrate the superiority of the proposed method over existing MTL approaches.

Introduction

The idea of MTL has roots in Stein's paradox, highlighting the benefits of joint estimation across seemingly independent tasks due to shared data-generating processes. In MTL, where multiple tasks are solved concurrently, the inductive biases from one task can beneficially inform others. However, the existing norm in MTL leverages hard or soft parameter sharing, posing challenges when tasks compete over shared resources. The authors emphasize the need to shift from a weighted-sum approach to a multi-objective optimization framework, aiming for Pareto optimality—solutions not dominated across tasks.

Proposed Methodology

The authors propose leveraging the multiple-gradient descent algorithm (MGDA) to navigate the multi-objective landscape. MGDA uses the Karush-Kuhn-Tucker (KKT) conditions to converge on points that are either Pareto optimal or provide a mutually beneficial descent direction across tasks. Recognizing the computational intensity of MGDA, especially in high-dimensional spaces typical of deep network parameters, the authors present a Frank-Wolfe-based optimizer to address these scale challenges.

To further enhance efficiency, the authors introduce an upper-bound approximation of the gradient norm for encoder-decoder architectures prevalent in deep networks. This approximation facilitates the computation of task-specific gradients through a singular backward pass, substantially reducing the computational overhead. The theoretical underpinning assures that this approach yields a Pareto optimal solution, contingent on the non-singularity of certain Jacobian matrices, which is a realistic assumption in practice.

Experiments

The empirical performance of the proposed method was demonstrated across three experimental setups:

  1. MultiMNIST: An adaptation of the MNIST dataset for MTL, highlighting the method's ability to evenly distribute model capacity between two conflicting digit classification tasks. The proposed method matched the single-task baseline performance, outperforming static or heuristic-based scaling strategies.
  2. Multi-Label Classification on CelebA: This setting involved treating each face attribute as a separate task, resulting in a 40-task MTL problem. The proposed methodology not only improved average multi-label classification accuracy but also performed consistently well across individual tasks compared to traditional uniform scaling.
  3. Scene Understanding with Cityscapes: Addressing three distinct tasks (semantic segmentation, instance segmentation, and depth estimation) in a singular framework, the method surpassed single-task performances, indicating beneficial task interactions.

Conclusions and Implications

The research provides a compelling argument for viewing MTL through the lens of multi-objective optimization, offering a pathway to embrace the inherent conflicts of multi-task settings rather than sidestepping them through simplistic aggregations. The approach promises a more robust framework for developing high-capacity models that cater to multiple objectives simultaneously, aligning with the growing trend towards comprehensive AI systems capable of solving a wide array of tasks. Looking forward, this framework will likely stimulate further research into multi-objective methods in MTL and find broader applications across domains that demand simultaneous optimization of competing objectives.

In conclusion, by redefining MTL as a multi-objective optimization problem, the authors lay a foundation for future developments in AI systems that need to efficiently balance trade-offs between multiple learning objectives.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Ozan Sener (28 papers)
  2. Vladlen Koltun (114 papers)
Citations (1,133)