Tied-LoRA: Efficient Low-Rank Adaptation

Updated 6 November 2025

Tied-LoRA is a parameter-efficient extension of LoRA that uses weight tying and selective training to compress adaptation overhead in large-scale models.
It maintains competitive performance across diverse tasks such as QA, summarization, and machine translation while significantly reducing trainable parameters.
Its design allows seamless merging with base models for on-device deployment, enabling scalable and storage-efficient multi-task fine-tuning.

Tied-LoRA is a parameter-efficient extension of Low-Rank Adaptation (LoRA) that addresses the storage and computational challenges inherent to large-scale, multi-task fine-tuning scenarios. By introducing layerwise weight tying and selective parameter training, Tied-LoRA compresses adaptation overhead while maintaining competitive model performance. This paradigm has been empirically validated on multiple tasks and architectures, demonstrating its efficacy in real-world LLM customization workflows.

1. Conceptual Foundation and Motivation

Large-scale LLMs require significant resources to fine-tune and store per-task or per-user customizations. Parameter-Efficient Fine-Tuning (PEFT) methods, such as LoRA, mitigate this by injecting low-rank trainable updates into frozen pretrained weights. However, as tasks and variants proliferate, even LoRA's per-layer adaptation overhead compounds.

Tied-LoRA targets this bottleneck by:

Weight tying: Sharing a single set of LoRA’s low-rank matrices across all transformer layers rather than parameterizing unique adaptations per layer.
Selective training: Explicitly controlling which components in the low-rank update (projection matrices, scaling vectors) are trainable versus frozen, defining a spectrum of configuration granularities.

These design choices seek to retain most of LoRA's adaptation capability while using a dramatically reduced number of trainable parameters—crucial for scalable and on-device deployment scenarios.

2. Technical Formulation and Parameterization Spectrum

2.1 LoRA and Tied-LoRA Generalization

Standard LoRA extends a frozen pretrained linear weight $W$ via: $z = Wx + \Delta Wx = Wx + \alpha \cdot BAx$ with $A \in \mathbb{R}^{d \times r}$ , $B \in \mathbb{R}^{r \times d'}$ , and $\alpha$ a scalar. Each transformer module typically receives its own uniquely parameterized $A,B$ .

Tied-LoRA generalizes this to: $z = Wx + \alpha \cdot A_v B A_u x$ where $A$ and $B$ are low-rank matrices (potentially tied across layers); $A_u$ (input-side) and $A_v$ (output-side) are diagonal scaling matrices or vectors per layer. Selective tying (sharing across layers) and training (frozen or trainable) on any of these components yields a continuum between LoRA, Tied-LoRA, and even more aggressive parameter reductions (see for example VeRA).

2.2 Configuration Table

A representative survey of configurations is shown below (letting $L$ denote number of layers, $d$ hidden size, $r$ rank):

Method	Trainable Elements	Trainable Parameters
LoRA (vBuA)	All, per-layer	$4Ldr$
TL5 (vBg uAg)	A,B tied; u,v frozen	$4dr$
TL6 (vBg uAg)	A,B tied; u,v trainable, per-layer	$4dr + L(r+3d)$
VeRA	A,B tied & frozen; u,v trainable	$L(r+3d)$

The greatest savings arise from tying the $A,B$ matrices across all layers. Selectively training per-layer scaling vectors ( $u,v$ ) in TL6 increases flexibility at a moderate parameter cost.

3. Experimental Evaluation Across Diverse Tasks

3.1 Datasets and Tasks

Empirical validation is reported over five distinct benchmarks:

Extractive QA (SQuAD v1; EM accuracy)
Summarization (DialogSum; ROUGE-L)
Commonsense NLI (HellaSwag; MC accuracy)
Machine Translation (IWSLT 2017 De→En; BLEU)
Mathematical Reasoning (GSM8K; EM accuracy)

3.2 Base Models and Training Regime

Backbones: LLaMA-2 7B and GPT-2B-001.
Training: NVIDIA NeMo; AdamW; cosine learning rate annealing; early stopping; $r \in \{2, 4, 8, ..., 128\}$ ; no extensive hyperparameter tuning.

4. Quantitative Results and Trade-offs

4.1 Performance vs Parameter Curve

Empirical findings highlight:

TL6: Delivers performance within $1\text{–}2\%$ of classic LoRA across tasks while using only $12.5\%$ (or less) of its parameters at higher ranks. For machine translation, TL6 slightly outperformed LoRA.
TL5: Achieves even greater compression (e.g., $3\%$ of LoRA's parameters for LLaMA-2 7B rank 8), with small accuracy drops relative to both TL6 and LoRA.
Performance is robust across a wide rank range, except for some extreme compression configurations or heavily capacity-limited tasks (e.g., GSM8K).
Layer-sharing (tying $A,B$ across layers) is markedly superior to simply reducing LoRA to a single-layer adaptation at fixed parameter counts.

Method	Translation (BLEU)	QA (EM)	NLI (Acc)	Summarization (ROUGE-L)	Params (% LoRA)
LoRA	41.3	88.5	91.97	40.76	100
TL6	41.33	87.97	91.15	39.24	12.5
TL5	41.01	87.11	91.75	40.62	3.1
VeRA	40.41	87.69	90.47	40.07	9.4

4.2 Task Dependence and Sensitivity

Tasks requiring substantial adaptation capacity (such as mathematical reasoning) exhibit a more pronounced gap between LoRA and highly compressed Tied-LoRA variants.
Tasks closely matched to the base model’s pretraining distribution (NLI, QA, summarization) are less sensitive to adaptation parameter reduction.

5. Methodological Nuances and Implications

Tied-LoRA enables a "drop-in" replacement for LoRA: the adaptation can be merged with base weights for inference, with no additional computational overhead. The approach is particularly advantageous for:

Scenarios with deep transformers or many concurrent task/user customizations.
Use cases where device or network bandwidth limits adapter deployment (due to significant reduction in adapter size).
Situations demanding rapid deployment or storage-efficient sharing of fine-tuned variants.

A key finding is that tying (sharing) low-rank matrices across layers, combined with judiciously chosen scaling vector parameterization, can serve a vast majority of LoRA's adaptation role with only a modest subset of its original parameter footprint.

6. Comparison to Other Parameter-Efficient Fine-Tuning Approaches

Tied-LoRA exists on a spectrum with other PEFT methods:

Classic LoRA: Most flexible, parameter-expensive.
VeRA: Even more parameter-efficient (freezes tied matrices, trains only per-layer/scaling vectors); can be less performant than Tied-LoRA in some tasks.
Uni-LoRA (Li et al., 1 Jun 2025): Extends these ideas, unifying LoRA, Tied-LoRA, and related approaches as instances of subspace projection with explicit global parameter sharing. Tied-LoRA corresponds to a block-diagonal, layer-local projection in this framework, while Uni-LoRA employs a global, isometric projection maximizing sharing and efficiency.

A distinct feature of Tied-LoRA is its flexibility—by selecting which matrices/vectors to tie and/or train, one can interpolate between expressiveness and efficiency.

7. Formalism Summary

Generalized Tied-LoRA update:

$z = Wx + \alpha \cdot A_v B A_u x$

$A, B$ : Low-rank matrices, either tied across layers or uniquely parameterized.
$A_u, A_v$ $A_{u}, A_{v}$ : Diagonal scaling matrices/vectors, potentially per-layer and trainable.
- Canonical configurations:
TL5: $A,B$ tied, $u,v$ frozen; all adaptation via global low-rank matrices.
TL6: $A,B$ tied, $u,v$ per-layer and trainable; balances global adaptation flexibility with moderate per-layer adjustment.

Tied-LoRA thus provides a parameterization regime that covers both the classic LoRA and more aggressive PEFT methods—enabling practitioners to tailor adaptation footprints to resource constraints and targeted task complexity.

PDF Markdown Chat (Pro)

References (1)

Uni-LoRA: One Vector is All You Need (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Tied-LoRA.