Papers
Topics
Authors
Recent
Search
2000 character limit reached

PCGrad: Resolving Gradient Conflicts in MTL

Updated 7 May 2026
  • The paper demonstrates that PCGrad directly projects conflicting gradients to prevent negative interference between tasks, enhancing convergence and test accuracy.
  • PCGrad offers a hyperparameter-free and computationally efficient solution that requires only minor modifications to standard gradient aggregation methods.
  • Empirical studies reveal that PCGrad significantly improves performance across supervised, reinforcement, and physics-informed learning, achieving notable accuracy and speed gains.

Projecting Conflicting Gradients (PCGrad) is a gradient manipulation scheme designed to address the optimization challenges inherent in multi-task learning and composite neural optimization objectives. The method operates by identifying and resolving destructive interference between gradients associated with different task-specific loss terms, directly intervening in the standard gradient aggregation mechanism to promote constructive update directions and mitigate negative transfer. PCGrad was first introduced in the context of deep multi-task learning and has since found applications across supervised, reinforcement, and physics-informed learning paradigms (Yu et al., 2020, Zhou et al., 2021, Bohn et al., 2024, Xiao et al., 16 Apr 2026).

1. Motivation and Theoretical Foundations

Multi-task learning (MTL) and multi-loss frameworks, such as physics-informed neural networks (PINNs), aggregate loss terms {Li}i=1K\{\mathcal{L}_i\}_{i=1}^K corresponding to different objectives, constraints, or tasks. Optimization traditionally proceeds via summing the gradients g=∑i∇θLig = \sum_i \nabla_\theta \mathcal{L}_i. However, when individual gradients are (i) highly imbalanced in magnitude or (ii) oriented in conflicting directions (i.e., negative cosine similarity), naïve aggregation leads to destructive interference. This results in oscillations, slow convergence, suboptimal solutions, and the phenomenon of negative transfer, where improvement in one task comes at the expense of another (Yu et al., 2020, Zhou et al., 2021).

Theoretical analysis shows that when task gradients gig_i and gjg_j satisfy

ω(gi,gj)=gi⋅gj∥gi∥∥gj∥<0,\omega(g_i, g_j) = \frac{g_i \cdot g_j}{\|g_i\|\|g_j\|} < 0,

inter-task gradient conflict occurs, violating aligned descent for the shared parameter vector (Yu et al., 2020). Empirical studies further reveal that such conflicts are both common and detrimental across domains, motivating algorithmic intervention.

2. Mathematical Formulation and Algorithm

PCGrad operates by "surgically" removing components of a task's gradient that directly oppose other task gradients. The core update for a pair of conflicting gradients is: gi⟵gi−gi⋅gj∥gj∥2gj,if gi⋅gj<0g_i \longleftarrow g_i - \frac{g_i \cdot g_j}{\|g_j\|^2} g_j, \quad \text{if } g_i \cdot g_j < 0 where gig_i and gjg_j are gradients of task ii and jj with respect to the shared model parameters. This projection ensures that, post-modification, the updated g=∑i∇θLig = \sum_i \nabla_\theta \mathcal{L}_i0 is non-conflicting with g=∑i∇θLig = \sum_i \nabla_\theta \mathcal{L}_i1.

The PCGrad algorithm proceeds as follows (Yu et al., 2020, Zhou et al., 2021, Bohn et al., 2024):

  1. For each task g=∑i∇θLig = \sum_i \nabla_\theta \mathcal{L}_i2, initialize g=∑i∇θLig = \sum_i \nabla_\theta \mathcal{L}_i3.
  2. Randomly order the tasks; for each g=∑i∇θLig = \sum_i \nabla_\theta \mathcal{L}_i4:
    • Randomly select another task g=∑i∇θLig = \sum_i \nabla_\theta \mathcal{L}_i5.
    • If g=∑i∇θLig = \sum_i \nabla_\theta \mathcal{L}_i6, project g=∑i∇θLig = \sum_i \nabla_\theta \mathcal{L}_i7 onto the normal plane of g=∑i∇θLig = \sum_i \nabla_\theta \mathcal{L}_i8 using the formula above.
  3. Form the aggregated update g=∑i∇θLig = \sum_i \nabla_\theta \mathcal{L}_i9.
  4. Update parameters: gig_i0, where gig_i1 is any base optimizer.

PCGrad is hyperparameter-free, requires only minor modification to the optimizer's gradient aggregation step, and is compatible with any first-order optimization method (Yu et al., 2020, Zhou et al., 2021).

3. Implementation and Computational Considerations

PCGrad introduces minimal computational overhead. For gig_i2 tasks, each iteration requires gig_i3 backward passes (unless gradient sharing is used) and gig_i4 inner products per projection. For small gig_i5 (e.g., gig_i6 or gig_i7 tasks), such as multi-component physics-informed losses or asymmetric two-task setups (LLM unlearning), the cost is negligible (Zhou et al., 2021, Xiao et al., 16 Apr 2026). The framework is readily integrated with standard neural optimization libraries without additional hyperparameters.

In "A generic physics-informed neural network-based framework for reliability assessment of multi-state systems" (Zhou et al., 2021), PCGrad is used with PINNs where gig_i8 or gig_i9 (M denotes the number of ODE residuals). In large-scale multi-task vision or RL (e.g., MT10, MT50), the overhead remains manageable by batching and vectorized dot-product computation (Yu et al., 2020, Bohn et al., 2024).

4. Empirical Performance and Applications

PCGrad yields consistent performance improvements in supervised learning, reinforcement learning, physics-informed learning, and LLM unlearning:

  • Supervised Learning: On CIFAR-100 (20 tasks), PCGrad raises average test accuracy from 67.7% (single-task) to 77.5% (routing nets + PCGrad) (Yu et al., 2020). On NYUv2 (3 tasks), it improves mean IoU and pixel accuracy, achieving best-in-class metrics for multi-task vision backbones.
  • Reinforcement Learning: In the Meta-World MT10/MT50 suite, SAC with PCGrad achieves 100%/70% multi-task success rates with significantly fewer samples than independent training (Yu et al., 2020).
  • Physics-Informed Learning: For PINN-based reliability assessment, RMSE is reduced by up to 96.6% on a 12-state system when incorporating PCGrad, and convergence accelerates by an order of magnitude in iteration count (Zhou et al., 2021).
  • Unlearning in LLMs: In asymmetric two-task setups (retention vs. forgetting), module-wise PCGrad projections increase retention performance (e.g., MMLU recovery from 25.1% to 53.0%) at matched forgetting strength, shifting solutions toward the Pareto frontier (Xiao et al., 16 Apr 2026).

5. Extensions, Variants, and Theoretical Insights

PCGrad's pairwise projection mechanism can be generalized:

  • Weighted PCGrad (wPCGrad): Task projection order is made probabilistically dependent on task priority or loss, allowing adaptive focus on underperforming or high-loss tasks (Bohn et al., 2024). This yields further performance gains in datasets such as nuScenes, CIFAR-100, and CelebA.
  • Module-Wise and Layer-Wise PCGrad: Fine-grained projection is applied at the module or layer level (e.g., for LLM unlearning), improving granularity and empirical retention (Xiao et al., 16 Apr 2026).
  • Algorithmic Hybrids: PCGrad can be combined with dynamic weighting schemes such as GradNorm or incorporated alongside global cone-based constraints (ConicGrad) and higher-order subspace projections (GradOPS) to navigate multi-objective trade-offs (Hassanpour et al., 31 Jan 2025, Zhu et al., 5 Mar 2025).

Theoretical results guarantee that, post-projection, the aggregated gradient remains a valid descent direction for the combined loss. For two-task convex problems, PCGrad guarantees convergence to either an optimum or a saddle point where gradients are exactly opposed. In the nonconvex regime, removing only the destructive components prevents regressive interference, stabilizes joint descent, and empirically supports faster convergence (Yu et al., 2020).

6. Limitations and Potential Directions

PCGrad assumes task equality—projections are performed only when direct conflict is present and without explicit re-weighting of loss scales. Scenarios with very large gjg_j0 confront increased computational cost, advocating for sampling or layer-wise approximations (Zhou et al., 2021, Yu et al., 2020). PCGrad does not enforce full strong non-confliction as in GradOPS, nor does it solve a global max–min as in ConicGrad, so it is possible for PCGrad solutions to remain suboptimal with respect to some Pareto objectives (Hassanpour et al., 31 Jan 2025, Zhu et al., 5 Mar 2025). Further extensions combine projection-based conflict resolution with adaptive weighting, meta-learned task prioritization, or global geometric constraints.

Method Principle Computational Complexity per Step Trade-off Control
PCGrad Pairwise conflict projection gjg_j1 Implicit; projection only
wPCGrad Weighted conflict projection gjg_j2 Adaptive anchor selection
GradOPS Subspace orthogonal projection gjg_j3 gjg_j4 parameter (trade-off)
ConicGrad Cone-constrained max–min solution gjg_j5 (via SMW) Cone width gjg_j6

Here, gjg_j7 is the number of tasks and gjg_j8 the parameter count. PCGrad provides a practical, model-agnostic, hyperparameter-free deconfliction strategy effective for a broad range of multi-objective learning problems, with theory and empirical results validating substantial improvements in accuracy, convergence, and optimization stability (Yu et al., 2020, Zhou et al., 2021, Bohn et al., 2024, Hassanpour et al., 31 Jan 2025, Zhu et al., 5 Mar 2025, Xiao et al., 16 Apr 2026).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Projecting Conflicting Gradients (PCGrad).