GradVac: Gradient Alignment in HSI Transfer
- GradVac is a gradient alignment technique that mitigates conflicting gradients in multi-task and cross-domain HSI transfer using an EMA-adaptive cosine similarity threshold.
- It adjusts source gradients by injecting a fraction of the target gradient when alignment falls below a preset threshold, ensuring cooperative parameter updates.
- Empirical results demonstrate significant overall accuracy improvements in high-conflict scenarios, establishing GradVac as a crucial component in the ADGKT framework.
GradVac is a gradient alignment technique designed to mitigate optimization conflicts in multi-task and cross-domain learning, specifically within the context of cross-scene hyperspectral imaging (HSI) transfer. Incorporated into the Agreement–Disagreement Guided Knowledge Transfer (ADGKT) framework, GradVac systematically aligns source and target gradients in the shared encoder, reducing destructive interference and enabling more effective joint training across heterogeneous data distributions (Huo et al., 8 Dec 2025).
1. Formalization of the GradVac Gradient Alignment Objective
Given source and target gradients, and , of shared encoder parameters respective to source loss and target loss , GradVac operates by assessing their cosine similarity:
A running threshold is maintained via exponential moving average (EMA):
When , indicating unacceptable gradient conflict, GradVac perturbs the source gradient by injecting a fraction of the target gradient, calculated so the angle between and after adjustment reaches a pre-set target cosine similarity :
where
This is equivalent to solving for the closest adjustment to under the constraint and then substituting . This closed-form adjustment targets minimization of destructive interference in shared parameter updates.
2. Intuitive Rationale and Role in Optimization Dynamics
The underlying intuition is that conflicting gradients (i.e., low , nearly orthogonal or opposed directions) can degrade learning, as parameter updates compete rather than cooperatively advance generalization across tasks. GradVac detects these conditions using an adaptive, scenario-sensitive threshold via EMA, then reorients the source gradient toward the target direction just enough to achieve a configurable minimal alignment (). This gradient “vaccination” modulates learning stochastically across optimization trajectories, rather than enforcing constant coupling.
In practical terms, standard joint updates for shared parameters follow:
With GradVac, the update replaces by when , actively suppressing destructive interactions. The EMA-driven reflects how much gradient divergence is natural for the current task/domain pair, adapting the intervention frequency.
3. Procedure and Algorithmic Integration
The GradVac mechanism is implemented per training step according to the following procedure:
| Step | Description | Key Operations |
|---|---|---|
| 1 | Compute Gradients | , |
| 2 | Calculate Cosine Similarity | |
| 3 | Update Threshold | |
| 4 | Check and Adjust | If : compute , set ; else, |
| 5 | Update Parameters |
Editor's term: “EMA-adaptive gradient thresholding” denotes the adaptive management of via stepwise exponential averaging.
4. Hyperparameter Choices and Ablation Outcomes
GradVac’s performance is sensitive to its hyperparameters:
- : EMA momentum for threshold (e.g., for I→P; for H→P)
- : Post-adjustment cosine similarity, set to a high value (e.g., $0.9$), not extensively varied in experiments.
Empirical ablation (see Table 6 from (Huo et al., 8 Dec 2025)) demonstrates:
| Scenario | OA Change | OA (Before → After) |
|---|---|---|
| I→P | ||
| H→P | ||
| P→H | ||
| I→H |
These results indicate that GradVac yields substantial improvement where cross-scene gradients are highly discordant, but may degrade performance if domain shifts are moderate or structured differently. This highlights the necessity of robust updating and complementary mechanisms.
5. Empirical Impact and Interaction With Companion Methods
Within ADGKT, GradVac constitutes one part of the “agreement” block, and is coupled immediately with LogitNorm. LogitNorm normalizes pre-softmax logits, reducing the risk of magnitude-based domination by either the source or target branch. Ablation reveals:
- GradVac alone (✓ – – –): Immediate OA gains in high-conflict scenarios.
- GradVac + LogitNorm (✓ ✓ – –): Further improvement (e.g., OA on I→P, on H→P).
- Full Agreement–Disagreement Block (✓ ✓ ✓ ✓): Optimal overall accuracy ( OA on I→P).
Without agreement mechanisms, naïve joint training often underfits small-target domains or results in poor compromise solutions. GradVac produces immediate OA improvements when severe gradient conflicts exist. LogitNorm prevents the source network (typically with more abundant data) from overwhelming optimization via large logits and commensurately larger gradient magnitudes. Together, these mechanisms form the prerequisite for the “disagreement” block’s ensemble feature extraction, which targets diversity in target scene representation.
6. Contextual Role within Cross-Scene HSI Transfer
GradVac was developed to address a core limitation of prior cross-domain HSI transfer methods—specifically, the prevalence of gradient conflicts between source and target tasks in the optimization of shared encoder parameters. By enabling adaptive gradient realignment, GradVac facilitates balanced knowledge transfer, allowing more complete exploitation of scene diversity in joint learning setups. Its empirical efficacy is scene-pair dependent, excelling in large domain shift situations, but its corrective actions can become counterproductive in low-conflict or tightly coupled domains. Thus, GradVac is best deployed in concert with dynamic thresholding and complementary mechanisms such as LogitNorm for robust performance across varied cross-scene transfer settings (Huo et al., 8 Dec 2025).