Group Orthogonal Low-Rank Adaptation
- GOLA is a parameter-efficient fine-tuning framework that reduces redundancy through structured rank decomposition, selective freezing, and clustering.
- It employs an inter-group orthogonality constraint to enforce diverse and complementary feature representation for enhanced RGB-T tracking.
- Empirical results show that GOLA variants achieve superior tracking accuracy and efficiency with fewer trainable parameters compared to baseline methods.
Group Orthogonal Low-Rank Adaptation (GOLA) is a parameter-efficient fine-tuning framework designed to enhance feature expressiveness and minimize information redundancy in low-rank adaptation modules, particularly for RGB-T (Red-Green-Blue and Thermal) tracking tasks. GOLA builds upon the low-rank adaptation (LoRA) paradigm by introducing principled rank selection, parameter freezing, clustering, and a novel inter-group orthogonality constraint, resulting in improved adaptability and efficiency for downstream tracking applications (Shao et al., 5 Dec 2025).
1. Low-Rank Adaptation Preliminaries
GOLA operates within the standard low-rank adaptation framework where, given a pretrained backbone with a weight matrix (e.g., for linear or attention-projection layers), fine-tuning is constrained to a learnable low-rank “adapter” . In LoRA, the adapted layer computes
with , , and . At inference, this merges to . The update can be equivalently expressed through a singular value decomposition (SVD):
where , , and are respectively the left and right singular vectors and singular values.
2. Quantifying Rank Importance through Decomposition
GOLA's central innovation is identifying redundancy within the rank space produced by LoRA-style adapters. This is accomplished by performing an SVD on the mean-centered matrix:
followed by
where , . The top- singular vectors and values are set as reference directions.
An -normalized importance score is then computed for each original column of by:
with denoting elementwise multiplication. Stacking all into and sorting descending yields an ordering such that .
3. Structured Freezing and Clustering of Ranks
GOLA categorizes ranks into “crucial” and “redundant” components. The top- indices , corresponding to the highest , are deemed crucial and their associated adapter columns/rows are frozen to preserve pretrained priors:
- and (frozen)
- The remaining form , (unfrozen, “redundant”)
Redundant ranks are partitioned into groups using constrained -means clustering on the columns of :
which minimizes the within-group sum of squares, subject to approximately balanced group sizes.
4. Inter-Group Orthogonality Constraint
To force redundant groups to learn diverse and complementary features, GOLA applies an inter-group orthogonality loss across the groups. With measuring overlap, the regularizer is
where , are adapters in group . In practice, a random pair is sampled per iteration to compute this penalty, enhancing computational efficiency (Shao et al., 5 Dec 2025).
5. Training Objective and Optimization
The overall tracking model objective combines:
- Classification loss (binary cross-entropy on predicted heatmaps)
- Regression loss (Generalized IoU)
- Orthogonality regularizer
The total loss is
with a small constant (set to for all experiments).
6. Empirical Performance
GOLA exhibits improved parameter efficiency and performance over baseline LoRA and other parameter-efficient fine-tuning techniques (Adapter, VPT, (IA), AdaLoRA, DoRA). Two GOLA variants were implemented:
- GOLA-B (DINOv2-B224 backbone): 99M parameters, 10% trainable, 85 GFLOPs, 125 fps on RTX 3090.
- GOLA-L (DINOv2-L224 backbone): 336M parameters, 8% trainable, 284 GFLOPs, 64 fps.
Empirical results on four benchmarks validate GOLA’s superiority:
| Dataset | Metric | Best Prior | GOLA-B | GOLA-L |
|---|---|---|---|---|
| GTOT | MPR | 93.2% | 92.8% | 95.3% |
| (50 seq) | MSR | 77.2% | 78.5% | 80.9% |
| RGBT210 | PR | 89.9% | 90.9% | 92.0% |
| (210k frames) | SR | 65.9% | 67.0% | 68.7% |
| RGBT234 | MPR | 92.1% | 92.2% | 92.8% |
| (234k frames) | MSR | 69.2% | 69.5% | 71.3% |
| LasHeR | PR | 76.9% | 77.5% | 78.1% |
| (735k frames) | NPR | 74.5% | 73.9% | 74.5% |
| SR | 60.9% | 61.6% | 61.9% |
Compared to LoRA (13% trainable), GOLA-B reduces trainable parameters by 23% while improving LasHeR PR/SR from 76.3%/60.7% to 77.5%/61.6%. Across benchmarks, clustered orthogonality consistently outperforms full fine-tuning and existing parameter-efficient fine-tuning methods (Shao et al., 5 Dec 2025).
7. Context and Implications
GOLA exemplifies a new direction in parameter-efficient model adaptation by leveraging explicit rank-space decomposition, targeted parameter freezing, and structured orthogonality to reduce redundancy and improve representation diversity in adapters. While the primary evaluation has centered on RGB-T tracking, a plausible implication is that the methodology of structured rank decomposition and orthogonal grouping could generalize to other settings where low-rank adaptation and parameter efficiency are critical, such as other vision modalities or large text models. GOLA’s design choices and demonstrated empirical advantages motivate further investigation into the principled structuring of low-rank adaptation spaces.