Papers
Topics
Authors
Recent
2000 character limit reached

GradVac: Gradient Alignment in HSI Transfer

Updated 15 December 2025
  • GradVac is a gradient alignment technique that mitigates conflicting gradients in multi-task and cross-domain HSI transfer using an EMA-adaptive cosine similarity threshold.
  • It adjusts source gradients by injecting a fraction of the target gradient when alignment falls below a preset threshold, ensuring cooperative parameter updates.
  • Empirical results demonstrate significant overall accuracy improvements in high-conflict scenarios, establishing GradVac as a crucial component in the ADGKT framework.

GradVac is a gradient alignment technique designed to mitigate optimization conflicts in multi-task and cross-domain learning, specifically within the context of cross-scene hyperspectral imaging (HSI) transfer. Incorporated into the Agreement–Disagreement Guided Knowledge Transfer (ADGKT) framework, GradVac systematically aligns source and target gradients in the shared encoder, reducing destructive interference and enabling more effective joint training across heterogeneous data distributions (Huo et al., 8 Dec 2025).

1. Formalization of the GradVac Gradient Alignment Objective

Given source and target gradients, gsg_s and gtg_t, of shared encoder parameters respective to source loss Ls\mathcal{L}_s and target loss Lt\mathcal{L}_t, GradVac operates by assessing their cosine similarity:

φ=cosθ=gsgtgsgt\varphi = \cos \theta = \frac{g_s \cdot g_t}{\|g_s\|\|g_t\|}

A running threshold α\alpha is maintained via exponential moving average (EMA):

α(t)=(1β)α(t1)+βφ(t1)withα(0)=0\alpha^{(t)} = (1 - \beta)\alpha^{(t-1)} + \beta\varphi^{(t-1)} \quad\text{with}\quad \alpha^{(0)} = 0

When φ<α\varphi < \alpha, indicating unacceptable gradient conflict, GradVac perturbs the source gradient by injecting a fraction η\eta of the target gradient, calculated so the angle between gsg_s' and gtg_t after adjustment reaches a pre-set target cosine similarity φT\varphi^T:

gs=gs+ηgtg_s' = g_s + \eta g_t

where

η=gs(φT1φ2φ1(φT)2)gt1(φT)2\eta = \frac{\|g_s\| (\varphi^T \sqrt{1-\varphi^2} - \varphi\sqrt{1-(\varphi^T)^2})}{\|g_t\|\sqrt{1-(\varphi^T)^2}}

This is equivalent to solving for the closest adjustment δ\delta to gsg_s under the constraint cos(δ,gt)φT\cos(\delta, g_t) \geq \varphi^T and then substituting δ=gs\delta = g_s'. This closed-form adjustment targets minimization of destructive interference in shared parameter updates.

2. Intuitive Rationale and Role in Optimization Dynamics

The underlying intuition is that conflicting gradients (i.e., low φ\varphi, nearly orthogonal or opposed directions) can degrade learning, as parameter updates compete rather than cooperatively advance generalization across tasks. GradVac detects these conditions using an adaptive, scenario-sensitive threshold α\alpha via EMA, then reorients the source gradient toward the target direction just enough to achieve a configurable minimal alignment (φT\varphi^T). This gradient “vaccination” modulates learning stochastically across optimization trajectories, rather than enforcing constant coupling.

In practical terms, standard joint updates for shared parameters θ\theta follow:

θθη0gsη0gt\theta \leftarrow \theta - \eta_0\,g_s - \eta_0\,g_t

With GradVac, the update replaces gsg_s by gsg_s' when φ<α\varphi < \alpha, actively suppressing destructive interactions. The EMA-driven α\alpha reflects how much gradient divergence is natural for the current task/domain pair, adapting the intervention frequency.

3. Procedure and Algorithmic Integration

The GradVac mechanism is implemented per training step according to the following procedure:

Step Description Key Operations
1 Compute Gradients gsθLsg_s \gets \nabla_\theta \mathcal{L}_s, gtθLtg_t \gets \nabla_\theta \mathcal{L}_t
2 Calculate Cosine Similarity φgsgtgsgt\varphi \gets \frac{g_s \cdot g_t}{\|g_s\|\|g_t\|}
3 Update Threshold α(1β)α+βφ\alpha \gets (1-\beta)\alpha + \beta\varphi
4 Check and Adjust If φ<α\varphi < \alpha: compute η\eta, set gs=gs+ηgtg_s' = g_s + \eta g_t; else, gs=gsg_s' = g_s
5 Update Parameters θθη0gsη0gt\theta \leftarrow \theta - \eta_0\,g_s' - \eta_0\,g_t

Editor's term: “EMA-adaptive gradient thresholding” denotes the adaptive management of α\alpha via stepwise exponential averaging.

4. Hyperparameter Choices and Ablation Outcomes

GradVac’s performance is sensitive to its hyperparameters:

  • β\beta: EMA momentum for threshold α\alpha (e.g., β=0.1\beta = 0.1 for I→P; β=0.01\beta = 0.01 for H→P)
  • φT\varphi^T: Post-adjustment cosine similarity, set to a high value (e.g., $0.9$), not extensively varied in experiments.

Empirical ablation (see Table 6 from (Huo et al., 8 Dec 2025)) demonstrates:

Scenario OA Change OA (Before → After)
I→P +5.36%+5.36\% 76.26%81.62%76.26\% \rightarrow 81.62\%
H→P +6.82%+6.82\% 73.61%80.43%73.61\% \rightarrow 80.43\%
P→H 2.11%-2.11\% 79.24%77.13%79.24\% \rightarrow 77.13\%
I→H 0.47%-0.47\% 79.24%78.77%79.24\% \rightarrow 78.77\%

These results indicate that GradVac yields substantial improvement where cross-scene gradients are highly discordant, but may degrade performance if domain shifts are moderate or structured differently. This highlights the necessity of robust α\alpha updating and complementary mechanisms.

5. Empirical Impact and Interaction With Companion Methods

Within ADGKT, GradVac constitutes one part of the “agreement” block, and is coupled immediately with LogitNorm. LogitNorm normalizes pre-softmax logits, reducing the risk of magnitude-based domination by either the source or target branch. Ablation reveals:

  • GradVac alone (✓ – – –): Immediate OA gains in high-conflict scenarios.
  • GradVac + LogitNorm (✓ ✓ – –): Further improvement (e.g., +3.26%+3.26\% OA on I→P, +0.78%+0.78\% on H→P).
  • Full Agreement–Disagreement Block (✓ ✓ ✓ ✓): Optimal overall accuracy (87.52%87.52\% OA on I→P).

Without agreement mechanisms, naïve joint training often underfits small-target domains or results in poor compromise solutions. GradVac produces immediate OA improvements when severe gradient conflicts exist. LogitNorm prevents the source network (typically with more abundant data) from overwhelming optimization via large logits and commensurately larger gradient magnitudes. Together, these mechanisms form the prerequisite for the “disagreement” block’s ensemble feature extraction, which targets diversity in target scene representation.

6. Contextual Role within Cross-Scene HSI Transfer

GradVac was developed to address a core limitation of prior cross-domain HSI transfer methods—specifically, the prevalence of gradient conflicts between source and target tasks in the optimization of shared encoder parameters. By enabling adaptive gradient realignment, GradVac facilitates balanced knowledge transfer, allowing more complete exploitation of scene diversity in joint learning setups. Its empirical efficacy is scene-pair dependent, excelling in large domain shift situations, but its corrective actions can become counterproductive in low-conflict or tightly coupled domains. Thus, GradVac is best deployed in concert with dynamic thresholding and complementary mechanisms such as LogitNorm for robust performance across varied cross-scene transfer settings (Huo et al., 8 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to GradVac.