SAM-GS: Similarity-Aware Gradient Surgery
- The paper introduces SAM-GS, which regularizes gradient aggregation in multi-task deep learning by adaptively switching between gradient equalisation and momentum modulation.
- It leverages a formally defined gradient magnitude similarity metric to detect conflicts and ensure fair convergence across diverse tasks.
- Experimental results on synthetic and real-world benchmarks demonstrate SAM-GS’s superior stability, fairness, and efficiency compared to conventional methods.
Similarity-Aware Momentum Gradient Surgery (SAM-GS) is an optimization methodology developed for multi-task deep learning (MTDL) that regularizes the gradient aggregation process based on gradient magnitude similarity across tasks. Its primary motivation is to resolve conflicts that arise from disparities in the magnitude of task-specific gradients, ensuring fair and efficient convergence when training a single model on multiple heterogeneous objectives. SAM-GS adaptively switches between gradient equalisation and momentum modulation, driven by a formally defined similarity metric. The approach is applicable to both synthetic multi-task optimization and real-world deep learning workloads involving multiple supervised or reinforcement learning objectives (Borsani et al., 6 Jun 2025).
1. Motivation and Problem Context
In the MTDL setting, the training process often suffers from conflicting gradients, a phenomenon where loss functions from different tasks yield gradients with either disparate magnitudes (causing some tasks to dominate updates and others to be neglected) or opposing directions. Traditional optimization schemes such as uniform averaging, static task weighting, and canonical aggregation do not resolve such gradient conflicts and can result in optimization bias, inefficient convergence, and suboptimal generalization.
Classic gradient surgery techniques (e.g., projection-based approaches) address directional conflicts but typically remain agnostic to magnitude-based conflicts, which are prevalent and increasingly problematic as the number of tasks grows. SAM-GS explicitly targets magnitude-based conflicts, using gradient magnitude similarity as its principal indicator of when corrective action is necessary.
2. Mathematical Foundations: Gradient Magnitude Similarity
SAM-GS introduces a gradient magnitude similarity measure between any two gradients :
This measure is symmetric, bounded within , and attains its maximum value when both gradients have identical magnitudes. For a collection of tasks, the average pairwise magnitude similarity at iteration is
Low indicates pronounced magnitude disparities (i.e., conflict), while high indicates similar gradient strengths.
SAM-GS intentionally ignores directional (angle-based) conflicts, focusing instead on the regularization of magnitude spread.
3. Mechanisms: Algorithmic Structure of SAM-GS
The SAM-GS update protocol operates in four primary stages per iteration:
- Gradient Computation & Similarity Assessment: Compute gradients for each task , then calculate .
- Momentum (EMA) Maintenance: For each task, update momentum variable :
Track a similarity momentum coefficient using an EMA:
- Adaptive Regularization:
- If (dissimilar, i.e., magnitude conflict): apply gradient equalisation
Here, denotes the average -norm across tasks. - If (similar magnitude): switch to momentum modulation
Bias-corrected EMAs are used as in Adam optimizer.
- Parameter Update: Aggregate reweighted gradients for all tasks, then update parameters:
( denotes element-wise multiplication.)
The logic of this scheme ensures that updates are cautious and equalized during magnitude conflict, and accelerated using momentum when conflicts are absent.
4. Regularization, Task Fairness, and Hyperparameter Effects
The central regularization effect in SAM-GS is controlled via the similarity threshold . When falls below , equalization prevents any single task from dominating optimization, which in classical approaches can lead to slow convergence or poor minority-task performance. For large , most updates use momentum; for very small , equalisation dominates.
Ablation results demonstrate that intermediate values of ($0.7$–$0.9$) yield the best results, whereas the system degenerates for (always equalise) or (always apply momentum). This highlights the necessity for adaptive, data-driven regularization tuned by the current task gradient statistics.
Hyperparameters () are mostly inherited from standard optimizers (as for Adam), with typically chosen via validation.
5. Experimental Evaluation and Benchmarks
SAM-GS was tested on synthetic Nash-MTL optimization problems and real-world MTDL benchmarks:
- Synthetic Landscape (Nash-MTL):
SAM-GS robustly attained global optima across all tested initializations and retained performance in problems involving multiple global optima and saddle regions, outperforming LS (linear sum), CAGrad, Aligned-MTL, Nash-MTL, and PCGrad.
- CityScapes (2 tasks):
SAM-GS is competitive with PCGrad and specialized angle-focused approaches.
- NYU-v2 (3 tasks), CelebA (40 tasks):
SAM-GS achieves state-of-the-art or superior performance, especially as task count and gradient magnitude variability increase, underscoring its efficacy in scenarios prone to magnitude-based conflicts.
- MetaWorld MT10 (Multi-task Reinforcement Learning):
Modifying aggregation using as min, SAM-GS matches Nash-MTL and outperforms alternatives, as measured by mean ranking and mean percent improvement.
Across all evaluations, results show that SAM-GS yields superior optimization stability and fairness (mean ranking or improvement), particularly in configurations where classical approaches suffer from task overshadowing or inefficient learning dynamics.
6. Comparison to Other Gradient Surgery Schemes
SAM-GS belongs to the family of gradient surgery methods that explicitly manipulate gradient aggregation based on task-to-task similarity:
| Method | Conflict Type | Regularization Mechanism | Element of Innovation |
|---|---|---|---|
| PCGrad | Directional | Projection on opposing direction | Negates cosine-conflict |
| GS-Agr/Agr-Sum | Sign | Strict consensus, zero-out | Enforces sign alignment |
| SAM-GS | Magnitude | Magnitude similarity/adaptive | Dynamic equalisation/momentum |
SAM-GS is distinct in explicitly targeting magnitude-based gradient conflicts, whereas principal alternatives focus on sign or angle. This suggests SAM-GS is complementary to direction-aware surgeries and particularly suited to large or highly heterogeneous task constellations where magnitude disparity suppresses minority learning signals (Borsani et al., 6 Jun 2025).
7. Significance, Limitations, and Applicability
SAM-GS is modular, compatible with standard optimizers, and computationally light. It is applicable in any MTDL or multi-objective optimization setting where gradient aggregation is central. Its emphasis on magnitude similarity ensures equitable optimization progress without requiring explicit manual task weighting or specialized tuning of learning rates across tasks.
A plausible implication is that in pure domain generalization or tasks where direction conflicts dominate, angle-based methods may be more effective (as observed in some CityScapes results), while SAM-GS delivers its advantages most pronouncedly where task imbalance is due to norm disparities. Its performance is sensitive to the choice of , but ablation studies reveal that moderate adaptation suffices for typical MTDL workloads.
SAM-GS thereby contributes a mathematically principled, empirically validated, and implementationally tractable approach to harmonizing multi-task optimization by leveraging similarity-aware adaptive regularization of momentum and magnitude.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free