Tied-LoRA: Efficient Low-Rank Adaptation
- Tied-LoRA is a parameter-efficient extension of LoRA that uses weight tying and selective training to compress adaptation overhead in large-scale models.
- It maintains competitive performance across diverse tasks such as QA, summarization, and machine translation while significantly reducing trainable parameters.
- Its design allows seamless merging with base models for on-device deployment, enabling scalable and storage-efficient multi-task fine-tuning.
Tied-LoRA is a parameter-efficient extension of Low-Rank Adaptation (LoRA) that addresses the storage and computational challenges inherent to large-scale, multi-task fine-tuning scenarios. By introducing layerwise weight tying and selective parameter training, Tied-LoRA compresses adaptation overhead while maintaining competitive model performance. This paradigm has been empirically validated on multiple tasks and architectures, demonstrating its efficacy in real-world LLM customization workflows.
1. Conceptual Foundation and Motivation
Large-scale LLMs require significant resources to fine-tune and store per-task or per-user customizations. Parameter-Efficient Fine-Tuning (PEFT) methods, such as LoRA, mitigate this by injecting low-rank trainable updates into frozen pretrained weights. However, as tasks and variants proliferate, even LoRA's per-layer adaptation overhead compounds.
Tied-LoRA targets this bottleneck by:
- Weight tying: Sharing a single set of LoRA’s low-rank matrices across all transformer layers rather than parameterizing unique adaptations per layer.
- Selective training: Explicitly controlling which components in the low-rank update (projection matrices, scaling vectors) are trainable versus frozen, defining a spectrum of configuration granularities.
These design choices seek to retain most of LoRA's adaptation capability while using a dramatically reduced number of trainable parameters—crucial for scalable and on-device deployment scenarios.
2. Technical Formulation and Parameterization Spectrum
2.1 LoRA and Tied-LoRA Generalization
Standard LoRA extends a frozen pretrained linear weight via: with , , and a scalar. Each transformer module typically receives its own uniquely parameterized .
Tied-LoRA generalizes this to: where and are low-rank matrices (potentially tied across layers); (input-side) and (output-side) are diagonal scaling matrices or vectors per layer. Selective tying (sharing across layers) and training (frozen or trainable) on any of these components yields a continuum between LoRA, Tied-LoRA, and even more aggressive parameter reductions (see for example VeRA).
2.2 Configuration Table
A representative survey of configurations is shown below (letting denote number of layers, hidden size, rank):
| Method | Trainable Elements | Trainable Parameters |
|---|---|---|
| LoRA (vBuA) | All, per-layer | $4Ldr$ |
| TL5 (vBg uAg) | A,B tied; u,v frozen | $4dr$ |
| TL6 (vBg uAg) | A,B tied; u,v trainable, per-layer | $4dr + L(r+3d)$ |
| VeRA | A,B tied & frozen; u,v trainable |
The greatest savings arise from tying the matrices across all layers. Selectively training per-layer scaling vectors () in TL6 increases flexibility at a moderate parameter cost.
3. Experimental Evaluation Across Diverse Tasks
3.1 Datasets and Tasks
Empirical validation is reported over five distinct benchmarks:
- Extractive QA (SQuAD v1; EM accuracy)
- Summarization (DialogSum; ROUGE-L)
- Commonsense NLI (HellaSwag; MC accuracy)
- Machine Translation (IWSLT 2017 De→En; BLEU)
- Mathematical Reasoning (GSM8K; EM accuracy)
3.2 Base Models and Training Regime
- Backbones: LLaMA-2 7B and GPT-2B-001.
- Training: NVIDIA NeMo; AdamW; cosine learning rate annealing; early stopping; ; no extensive hyperparameter tuning.
4. Quantitative Results and Trade-offs
4.1 Performance vs Parameter Curve
Empirical findings highlight:
- TL6: Delivers performance within of classic LoRA across tasks while using only (or less) of its parameters at higher ranks. For machine translation, TL6 slightly outperformed LoRA.
- TL5: Achieves even greater compression (e.g., of LoRA's parameters for LLaMA-2 7B rank 8), with small accuracy drops relative to both TL6 and LoRA.
- Performance is robust across a wide rank range, except for some extreme compression configurations or heavily capacity-limited tasks (e.g., GSM8K).
- Layer-sharing (tying across layers) is markedly superior to simply reducing LoRA to a single-layer adaptation at fixed parameter counts.
| Method | Translation (BLEU) | QA (EM) | NLI (Acc) | Summarization (ROUGE-L) | Params (% LoRA) |
|---|---|---|---|---|---|
| LoRA | 41.3 | 88.5 | 91.97 | 40.76 | 100 |
| TL6 | 41.33 | 87.97 | 91.15 | 39.24 | 12.5 |
| TL5 | 41.01 | 87.11 | 91.75 | 40.62 | 3.1 |
| VeRA | 40.41 | 87.69 | 90.47 | 40.07 | 9.4 |
4.2 Task Dependence and Sensitivity
- Tasks requiring substantial adaptation capacity (such as mathematical reasoning) exhibit a more pronounced gap between LoRA and highly compressed Tied-LoRA variants.
- Tasks closely matched to the base model’s pretraining distribution (NLI, QA, summarization) are less sensitive to adaptation parameter reduction.
5. Methodological Nuances and Implications
Tied-LoRA enables a "drop-in" replacement for LoRA: the adaptation can be merged with base weights for inference, with no additional computational overhead. The approach is particularly advantageous for:
- Scenarios with deep transformers or many concurrent task/user customizations.
- Use cases where device or network bandwidth limits adapter deployment (due to significant reduction in adapter size).
- Situations demanding rapid deployment or storage-efficient sharing of fine-tuned variants.
A key finding is that tying (sharing) low-rank matrices across layers, combined with judiciously chosen scaling vector parameterization, can serve a vast majority of LoRA's adaptation role with only a modest subset of its original parameter footprint.
6. Comparison to Other Parameter-Efficient Fine-Tuning Approaches
Tied-LoRA exists on a spectrum with other PEFT methods:
- Classic LoRA: Most flexible, parameter-expensive.
- VeRA: Even more parameter-efficient (freezes tied matrices, trains only per-layer/scaling vectors); can be less performant than Tied-LoRA in some tasks.
- Uni-LoRA (Li et al., 1 Jun 2025): Extends these ideas, unifying LoRA, Tied-LoRA, and related approaches as instances of subspace projection with explicit global parameter sharing. Tied-LoRA corresponds to a block-diagonal, layer-local projection in this framework, while Uni-LoRA employs a global, isometric projection maximizing sharing and efficiency.
A distinct feature of Tied-LoRA is its flexibility—by selecting which matrices/vectors to tie and/or train, one can interpolate between expressiveness and efficiency.
7. Formalism Summary
- Generalized Tied-LoRA update:
- : Low-rank matrices, either tied across layers or uniquely parameterized.
- : Diagonal scaling matrices/vectors, potentially per-layer and trainable.
- Canonical configurations:
- TL5: tied, frozen; all adaptation via global low-rank matrices.
- TL6: tied, per-layer and trainable; balances global adaptation flexibility with moderate per-layer adjustment.
Tied-LoRA thus provides a parameterization regime that covers both the classic LoRA and more aggressive PEFT methods—enabling practitioners to tailor adaptation footprints to resource constraints and targeted task complexity.