GradVac: Gradient Alignment in HSI Transfer

Updated 15 December 2025

GradVac is a gradient alignment technique that mitigates conflicting gradients in multi-task and cross-domain HSI transfer using an EMA-adaptive cosine similarity threshold.
It adjusts source gradients by injecting a fraction of the target gradient when alignment falls below a preset threshold, ensuring cooperative parameter updates.
Empirical results demonstrate significant overall accuracy improvements in high-conflict scenarios, establishing GradVac as a crucial component in the ADGKT framework.

GradVac is a gradient alignment technique designed to mitigate optimization conflicts in multi-task and cross-domain learning, specifically within the context of cross-scene hyperspectral imaging (HSI) transfer. Incorporated into the Agreement–Disagreement Guided Knowledge Transfer (ADGKT) framework, GradVac systematically aligns source and target gradients in the shared encoder, reducing destructive interference and enabling more effective joint training across heterogeneous data distributions (Huo et al., 8 Dec 2025).

1. Formalization of the GradVac Gradient Alignment Objective

Given source and target gradients, $g_s$ and $g_t$ , of shared encoder parameters respective to source loss $\mathcal{L}_s$ and target loss $\mathcal{L}_t$ , GradVac operates by assessing their cosine similarity:

$\varphi = \cos \theta = \frac{g_s \cdot g_t}{\|g_s\|\|g_t\|}$

A running threshold $\alpha$ is maintained via exponential moving average (EMA):

$\alpha^{(t)} = (1 - \beta)\alpha^{(t-1)} + \beta\varphi^{(t-1)} \quad\text{with}\quad \alpha^{(0)} = 0$

When $\varphi < \alpha$ , indicating unacceptable gradient conflict, GradVac perturbs the source gradient by injecting a fraction $\eta$ of the target gradient, calculated so the angle between $g_s'$ and $g_t$ after adjustment reaches a pre-set target cosine similarity $\varphi^T$ :

$g_s' = g_s + \eta g_t$

where

$\eta = \frac{\|g_s\| (\varphi^T \sqrt{1-\varphi^2} - \varphi\sqrt{1-(\varphi^T)^2})}{\|g_t\|\sqrt{1-(\varphi^T)^2}}$

This is equivalent to solving for the closest adjustment $\delta$ to $g_s$ under the constraint $\cos(\delta, g_t) \geq \varphi^T$ and then substituting $\delta = g_s'$ . This closed-form adjustment targets minimization of destructive interference in shared parameter updates.

2. Intuitive Rationale and Role in Optimization Dynamics

The underlying intuition is that conflicting gradients (i.e., low $\varphi$ , nearly orthogonal or opposed directions) can degrade learning, as parameter updates compete rather than cooperatively advance generalization across tasks. GradVac detects these conditions using an adaptive, scenario-sensitive threshold $\alpha$ via EMA, then reorients the source gradient toward the target direction just enough to achieve a configurable minimal alignment ( $\varphi^T$ ). This gradient “vaccination” modulates learning stochastically across optimization trajectories, rather than enforcing constant coupling.

In practical terms, standard joint updates for shared parameters $\theta$ follow:

$\theta \leftarrow \theta - \eta_0\,g_s - \eta_0\,g_t$

With GradVac, the update replaces $g_s$ by $g_s'$ when $\varphi < \alpha$ , actively suppressing destructive interactions. The EMA-driven $\alpha$ reflects how much gradient divergence is natural for the current task/domain pair, adapting the intervention frequency.

3. Procedure and Algorithmic Integration

The GradVac mechanism is implemented per training step according to the following procedure:

Step	Description	Key Operations
1	Compute Gradients	$g_s \gets \nabla_\theta \mathcal{L}_s$ , $g_t \gets \nabla_\theta \mathcal{L}_t$
2	Calculate Cosine Similarity	$\varphi \gets \frac{g_s \cdot g_t}{\\|g_s\\|\\|g_t\\|}$
3	Update Threshold	$\alpha \gets (1-\beta)\alpha + \beta\varphi$
4	Check and Adjust	If $\varphi < \alpha$ : compute $\eta$ , set $g_s' = g_s + \eta g_t$ ; else, $g_s' = g_s$
5	Update Parameters	$\theta \leftarrow \theta - \eta_0\,g_s' - \eta_0\,g_t$

Editor's term: “EMA-adaptive gradient thresholding” denotes the adaptive management of $\alpha$ via stepwise exponential averaging.

4. Hyperparameter Choices and Ablation Outcomes

GradVac’s performance is sensitive to its hyperparameters:

$\beta$ : EMA momentum for threshold $\alpha$ (e.g., $\beta = 0.1$ for I→P; $\beta = 0.01$ for H→P)
$\varphi^T$ : Post-adjustment cosine similarity, set to a high value (e.g., $0.9$), not extensively varied in experiments.

Empirical ablation (see Table 6 from (Huo et al., 8 Dec 2025)) demonstrates:

Scenario	OA Change	OA (Before → After)
I→P	$+5.36\%$	$76.26\% \rightarrow 81.62\%$
H→P	$+6.82\%$	$73.61\% \rightarrow 80.43\%$
P→H	$-2.11\%$	$79.24\% \rightarrow 77.13\%$
I→H	$-0.47\%$	$79.24\% \rightarrow 78.77\%$

These results indicate that GradVac yields substantial improvement where cross-scene gradients are highly discordant, but may degrade performance if domain shifts are moderate or structured differently. This highlights the necessity of robust $\alpha$ updating and complementary mechanisms.

5. Empirical Impact and Interaction With Companion Methods

Within ADGKT, GradVac constitutes one part of the “agreement” block, and is coupled immediately with LogitNorm. LogitNorm normalizes pre-softmax logits, reducing the risk of magnitude-based domination by either the source or target branch. Ablation reveals:

GradVac alone (✓ – – –): Immediate OA gains in high-conflict scenarios.
GradVac + LogitNorm (✓ ✓ – –): Further improvement (e.g., $+3.26\%$ OA on I→P, $+0.78\%$ on H→P).
Full Agreement–Disagreement Block (✓ ✓ ✓ ✓): Optimal overall accuracy ( $87.52\%$ OA on I→P).

Without agreement mechanisms, naïve joint training often underfits small-target domains or results in poor compromise solutions. GradVac produces immediate OA improvements when severe gradient conflicts exist. LogitNorm prevents the source network (typically with more abundant data) from overwhelming optimization via large logits and commensurately larger gradient magnitudes. Together, these mechanisms form the prerequisite for the “disagreement” block’s ensemble feature extraction, which targets diversity in target scene representation.

6. Contextual Role within Cross-Scene HSI Transfer

GradVac was developed to address a core limitation of prior cross-domain HSI transfer methods—specifically, the prevalence of gradient conflicts between source and target tasks in the optimization of shared encoder parameters. By enabling adaptive gradient realignment, GradVac facilitates balanced knowledge transfer, allowing more complete exploitation of scene diversity in joint learning setups. Its empirical efficacy is scene-pair dependent, excelling in large domain shift situations, but its corrective actions can become counterproductive in low-conflict or tightly coupled domains. Thus, GradVac is best deployed in concert with dynamic thresholding and complementary mechanisms such as LogitNorm for robust performance across varied cross-scene transfer settings (Huo et al., 8 Dec 2025).

PDF Markdown Chat (Pro)

References (1)

Agreement Disagreement Guided Knowledge Transfer for Cross-Scene Hyperspectral Imaging (2025)

GradVac: Gradient Alignment in HSI Transfer

1. Formalization of the GradVac Gradient Alignment Objective

2. Intuitive Rationale and Role in Optimization Dynamics

3. Procedure and Algorithmic Integration

4. Hyperparameter Choices and Ablation Outcomes

5. Empirical Impact and Interaction With Companion Methods

6. Contextual Role within Cross-Scene HSI Transfer

Whiteboard

Follow Topic

Continue Learning

GradVac: Gradient Alignment in HSI Transfer

1. Formalization of the GradVac Gradient Alignment Objective

2. Intuitive Rationale and Role in Optimization Dynamics

3. Procedure and Algorithmic Integration

4. Hyperparameter Choices and Ablation Outcomes

5. Empirical Impact and Interaction With Companion Methods

6. Contextual Role within Cross-Scene HSI Transfer

Sponsor

Whiteboard

Follow Topic

Continue Learning

Related Topics