Papers
Topics
Authors
Recent
2000 character limit reached

Tied-LoRA: Efficient Low-Rank Adaptation

Updated 6 November 2025
  • Tied-LoRA is a parameter-efficient extension of LoRA that uses weight tying and selective training to compress adaptation overhead in large-scale models.
  • It maintains competitive performance across diverse tasks such as QA, summarization, and machine translation while significantly reducing trainable parameters.
  • Its design allows seamless merging with base models for on-device deployment, enabling scalable and storage-efficient multi-task fine-tuning.

Tied-LoRA is a parameter-efficient extension of Low-Rank Adaptation (LoRA) that addresses the storage and computational challenges inherent to large-scale, multi-task fine-tuning scenarios. By introducing layerwise weight tying and selective parameter training, Tied-LoRA compresses adaptation overhead while maintaining competitive model performance. This paradigm has been empirically validated on multiple tasks and architectures, demonstrating its efficacy in real-world LLM customization workflows.

1. Conceptual Foundation and Motivation

Large-scale LLMs require significant resources to fine-tune and store per-task or per-user customizations. Parameter-Efficient Fine-Tuning (PEFT) methods, such as LoRA, mitigate this by injecting low-rank trainable updates into frozen pretrained weights. However, as tasks and variants proliferate, even LoRA's per-layer adaptation overhead compounds.

Tied-LoRA targets this bottleneck by:

  • Weight tying: Sharing a single set of LoRA’s low-rank matrices across all transformer layers rather than parameterizing unique adaptations per layer.
  • Selective training: Explicitly controlling which components in the low-rank update (projection matrices, scaling vectors) are trainable versus frozen, defining a spectrum of configuration granularities.

These design choices seek to retain most of LoRA's adaptation capability while using a dramatically reduced number of trainable parameters—crucial for scalable and on-device deployment scenarios.

2. Technical Formulation and Parameterization Spectrum

2.1 LoRA and Tied-LoRA Generalization

Standard LoRA extends a frozen pretrained linear weight WW via: z=Wx+ΔWx=Wx+α⋅BAxz = Wx + \Delta Wx = Wx + \alpha \cdot BAx with A∈Rd×rA \in \mathbb{R}^{d \times r}, B∈Rr×d′B \in \mathbb{R}^{r \times d'}, and α\alpha a scalar. Each transformer module typically receives its own uniquely parameterized A,BA,B.

Tied-LoRA generalizes this to: z=Wx+α⋅AvBAuxz = Wx + \alpha \cdot A_v B A_u x where AA and BB are low-rank matrices (potentially tied across layers); AuA_u (input-side) and AvA_v (output-side) are diagonal scaling matrices or vectors per layer. Selective tying (sharing across layers) and training (frozen or trainable) on any of these components yields a continuum between LoRA, Tied-LoRA, and even more aggressive parameter reductions (see for example VeRA).

2.2 Configuration Table

A representative survey of configurations is shown below (letting LL denote number of layers, dd hidden size, rr rank):

Method Trainable Elements Trainable Parameters
LoRA (vBuA) All, per-layer $4Ldr$
TL5 (vBg uAg) A,B tied; u,v frozen $4dr$
TL6 (vBg uAg) A,B tied; u,v trainable, per-layer $4dr + L(r+3d)$
VeRA A,B tied & frozen; u,v trainable L(r+3d)L(r+3d)

The greatest savings arise from tying the A,BA,B matrices across all layers. Selectively training per-layer scaling vectors (u,vu,v) in TL6 increases flexibility at a moderate parameter cost.

3. Experimental Evaluation Across Diverse Tasks

3.1 Datasets and Tasks

Empirical validation is reported over five distinct benchmarks:

  • Extractive QA (SQuAD v1; EM accuracy)
  • Summarization (DialogSum; ROUGE-L)
  • Commonsense NLI (HellaSwag; MC accuracy)
  • Machine Translation (IWSLT 2017 De→En; BLEU)
  • Mathematical Reasoning (GSM8K; EM accuracy)

3.2 Base Models and Training Regime

  • Backbones: LLaMA-2 7B and GPT-2B-001.
  • Training: NVIDIA NeMo; AdamW; cosine learning rate annealing; early stopping; r∈{2,4,8,...,128}r \in \{2, 4, 8, ..., 128\}; no extensive hyperparameter tuning.

4. Quantitative Results and Trade-offs

4.1 Performance vs Parameter Curve

Empirical findings highlight:

  • TL6: Delivers performance within 1–2%1\text{–}2\% of classic LoRA across tasks while using only 12.5%12.5\% (or less) of its parameters at higher ranks. For machine translation, TL6 slightly outperformed LoRA.
  • TL5: Achieves even greater compression (e.g., 3%3\% of LoRA's parameters for LLaMA-2 7B rank 8), with small accuracy drops relative to both TL6 and LoRA.
  • Performance is robust across a wide rank range, except for some extreme compression configurations or heavily capacity-limited tasks (e.g., GSM8K).
  • Layer-sharing (tying A,BA,B across layers) is markedly superior to simply reducing LoRA to a single-layer adaptation at fixed parameter counts.
Method Translation (BLEU) QA (EM) NLI (Acc) Summarization (ROUGE-L) Params (% LoRA)
LoRA 41.3 88.5 91.97 40.76 100
TL6 41.33 87.97 91.15 39.24 12.5
TL5 41.01 87.11 91.75 40.62 3.1
VeRA 40.41 87.69 90.47 40.07 9.4

4.2 Task Dependence and Sensitivity

  • Tasks requiring substantial adaptation capacity (such as mathematical reasoning) exhibit a more pronounced gap between LoRA and highly compressed Tied-LoRA variants.
  • Tasks closely matched to the base model’s pretraining distribution (NLI, QA, summarization) are less sensitive to adaptation parameter reduction.

5. Methodological Nuances and Implications

Tied-LoRA enables a "drop-in" replacement for LoRA: the adaptation can be merged with base weights for inference, with no additional computational overhead. The approach is particularly advantageous for:

  • Scenarios with deep transformers or many concurrent task/user customizations.
  • Use cases where device or network bandwidth limits adapter deployment (due to significant reduction in adapter size).
  • Situations demanding rapid deployment or storage-efficient sharing of fine-tuned variants.

A key finding is that tying (sharing) low-rank matrices across layers, combined with judiciously chosen scaling vector parameterization, can serve a vast majority of LoRA's adaptation role with only a modest subset of its original parameter footprint.

6. Comparison to Other Parameter-Efficient Fine-Tuning Approaches

Tied-LoRA exists on a spectrum with other PEFT methods:

  • Classic LoRA: Most flexible, parameter-expensive.
  • VeRA: Even more parameter-efficient (freezes tied matrices, trains only per-layer/scaling vectors); can be less performant than Tied-LoRA in some tasks.
  • Uni-LoRA (Li et al., 1 Jun 2025): Extends these ideas, unifying LoRA, Tied-LoRA, and related approaches as instances of subspace projection with explicit global parameter sharing. Tied-LoRA corresponds to a block-diagonal, layer-local projection in this framework, while Uni-LoRA employs a global, isometric projection maximizing sharing and efficiency.

A distinct feature of Tied-LoRA is its flexibility—by selecting which matrices/vectors to tie and/or train, one can interpolate between expressiveness and efficiency.

7. Formalism Summary

  • Generalized Tied-LoRA update:

z=Wx+α⋅AvBAuxz = Wx + \alpha \cdot A_v B A_u x

  • A,BA, B: Low-rank matrices, either tied across layers or uniquely parameterized.
  • Au,AvA_u, A_v: Diagonal scaling matrices/vectors, potentially per-layer and trainable.
    • Canonical configurations:
  • TL5: A,BA,B tied, u,vu,v frozen; all adaptation via global low-rank matrices.
  • TL6: A,BA,B tied, u,vu,v per-layer and trainable; balances global adaptation flexibility with moderate per-layer adjustment.

Tied-LoRA thus provides a parameterization regime that covers both the classic LoRA and more aggressive PEFT methods—enabling practitioners to tailor adaptation footprints to resource constraints and targeted task complexity.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Tied-LoRA.