Gradient Similarity Surgery in Multi-Task Deep Learning (2506.06130v1)

Published 6 Jun 2025 in cs.LG and cs.CV

Abstract: The multi-task learning ($MTL$) paradigm aims to simultaneously learn multiple tasks within a single model capturing higher-level, more general hidden patterns that are shared by the tasks. In deep learning, a significant challenge in the backpropagation training process is the design of advanced optimisers to improve the convergence speed and stability of the gradient descent learning rule. In particular, in multi-task deep learning ($MTDL$) the multitude of tasks may generate potentially conflicting gradients that would hinder the concurrent convergence of the diverse loss functions. This challenge arises when the gradients of the task objectives have either different magnitudes or opposite directions, causing one or a few to dominate or to interfere with each other, thus degrading the training process. Gradient surgery methods address the problem explicitly dealing with conflicting gradients by adjusting the overall gradient trajectory. This work introduces a novel gradient surgery method, the Similarity-Aware Momentum Gradient Surgery (SAM-GS), which provides an effective and scalable approach based on a gradient magnitude similarity measure to guide the optimisation process. The SAM-GS surgery adopts gradient equalisation and modulation of the first-order momentum. A series of experimental tests have shown the effectiveness of SAM-GS on synthetic problems and $MTL$ benchmarks. Gradient magnitude similarity plays a crucial role in regularising gradient aggregation in $MTDL$ for the optimisation of the learning process.

Authors (4)

Thomas Borsani (2 papers)
Andrea Rosani (2 papers)
Giuseppe Nicosia (15 papers)
Giuseppe Di Fatta (2 papers)

Summary

Gradient Similarity Surgery in Multi-Task Deep Learning

The paper "Gradient Similarity Surgery in Multi-Task Deep Learning" presents an innovative approach to the optimization challenges posed by conflicting gradients in multi-task deep learning (MTDL) models. This work is pivotal for researchers focusing on the enhancement of multi-task learning (MTL) systems, where the objective is to learn multiple tasks simultaneously using a single model to leverage shared representations for improved generalization and efficiency across tasks.

Core Contributions

The authors introduce a novel technique called Similarity-Aware Momentum Gradient Surgery (SAM-GS) to address the inherent issue of conflicting gradients in MTDL. This problem arises due to the presence of gradients from different tasks that either have differing magnitudes or are oriented in conflicting directions, thereby impeding effective convergence.

SAM-GS differentiates itself by introducing a gradient magnitude similarity measure that guides the optimization process. This measure is integral to determining how to handle the gradients for each task, ensuring an equitable distribution that prevents any single task from dominating due to excessive gradient magnitude.

Methodological Insights

SAM-GS operates by applying two mechanisms:

Gradient Equalisation: When the gradient magnitudes of tasks are dissimilar, SAM-GS equalizes these gradients to ensure that the optimization update is not biased towards a task with larger gradients.
Momentum-Based Regularisation: This involves modulating the contribution of the momentum term used in gradient descent, depending on the similarity of gradient magnitudes. The method dynamically tunes the influence of momentum, enhancing stability and efficiency.

The strategy employed by SAM-GS is particularly novel as it disregards angle-based gradient conflicts—considering their effect primarily on convergence speed rather than the destination. Instead, it focuses on magnitude conflicts which significantly affect task performance by diverting optimization away from a balanced solution.

Evaluation and Results

The paper conducts extensive experiments, validating SAM-GS on synthetic tasks, multi-task supervised learning benchmarks, and multi-task reinforcement learning scenarios. In synthetic settings, especially in problems with multiple optima, SAM-GS demonstrates superior capability in reaching global optima efficiently compared to existing methods.

For real-world benchmarks such as NYU-v2, CelebA, and CityScapes, SAM-GS consistently achieves competitive or superior performance against several state-of-the-art methodologies like MGDA, PCGrad, GradNorm, and Nash-MTL. The results indicate that SAM-GS can effectively handle a large number of tasks while maintaining or improving overall accuracy and convergence stability.

Implications and Future Directions

The introduction of SAM-GS paves the way for further exploration into optimization strategies that leverage gradient similarity metrics. By focusing on regularisation through gradient magnitude similarity, SAM-GS contributes a significant tool for enhancing the efficiency of MTL models.

The broader implications of this work suggest potential applicability in fields such as natural language processing, computer vision, and other areas where MTL is prevalent. Future research may focus on exploring theoretical guarantees for convergence, extending the method's applicability to more complex model architectures, and improving resilience against other forms of gradient conflicts, such as those introduced by adversarial tasks.

Overall, this paper is an important contribution to the multi-task learning community, presenting a robust, scalable method for tackling one of the pivotal challenges in training multi-task deep learning models. Researchers could build on this approach to further refine multi-task optimization techniques and explore its adaptability in various complex, real-world datasets.

Related Papers

Find Related Papers

YouTube

Show All Videos