Papers
Topics
Authors
Recent
Search
2000 character limit reached

Parameter-Efficient Finetuning Techniques

Updated 18 March 2026
  • Parameter-efficient finetuning techniques are methods that adapt large pre-trained models to downstream tasks by updating only a small fraction of parameters.
  • They leverage subspace decomposition strategies—either by reconstructing or extending model weights—to approach full fine-tuning performance with minimal computational overhead.
  • Empirical studies show that methods like SSL and SSB achieve up to 99% performance of full models while tuning as little as 0.01–1% of parameters.

Parameter-efficient finetuning (PEFT) comprises a diverse family of methods designed to adapt large pre-trained foundation models to downstream tasks while updating only a small fraction (typically 0.01–1%) of parameters. By freezing the majority of model weights and introducing minimal adaptation modules or modifications, PEFT achieves nearly full fine-tuning performance, while substantially reducing computational, storage, and optimization overheads, enabling scalable adaptation across tasks and domains (Si et al., 2024).

1. Motivation and Problem Context

With the rapid proliferation of foundation models containing billions of parameters, full fine-tuning for each downstream task is increasingly impractical due to computational demands and prohibitive storage associated with maintaining separate model copies (Si et al., 2024). PEFT directly targets these bottlenecks by freezing the pretrained weights WRn×mW\in\mathbb{R}^{n\times m} and introducing a small delta module ϕ\phi—often with only 0.01–1% of the total parameters—to approximate the performance P(W)P(W^*) of full fine-tuning WW^*, while minimizing both parameter count and training/inference time.

Design constraints for PEFT methods include minimizing trainable parameters, preserving throughput (no significant increase in forward/backward cost), and achieving downstream performance (e.g., accuracy, F1) close to that of fully tuned models (Si et al., 2024).

2. Unifying Mathematical Framework: Subspace Decomposition

PEFT methods can be unified under a decomposition-based framework, where the pretrained weight matrix is viewed as spanning a subspace S(W)S(W). Adaptation is then realized either by reconstructing/rescaling the original subspace (subspace reconstruction) or augmenting it with additional directions (subspace extension):

  • Subspace Reconstruction: Apply a transformation ϕ(W)=g(f(W))\phi(W) = g(f(W)) where ff is a (possibly structured) reconstruction/scaling (e.g., singular-value modification, diagonal scaling), and gg may introduce new directions.
    • Mode 1: Only singular values Σ\Sigma are updated, keeping U,VU,V fixed (e.g., SAM-PARSER).
    • Mode 2: Scale left (D1D_1) and/or right (D2D_2) singular vector spaces: ϕ(W)=D1WD2\phi(W) = D_1 W D_2 (diagonal).
    • Examples: (IA)3^3 (ϕ(W)=WD2\phi(W)=W D_2), SSL (ϕ(W)=D1W\phi(W)=D_1 W), SSB (ϕ(W)=D1WD2\phi(W)=D_1 W D_2).
    • Mode 3: Perturb UU or VV (e.g., BitFit, prefix or prompt tuning).
  • Subspace Extension: Add a low-rank update ΔW\Delta W to expand S(W)S(W):
    • LoRA family: ΔW=AB\Delta W=AB (ARn×r,BRr×mA\in\mathbb{R}^{n\times r}, B\in\mathbb{R}^{r\times m}, rmin(n,m)r\ll \min(n,m)), and extensions such as AdaLoRA (ΔW=ADB\Delta W=ADB with diagonal DD), FLoRA/TriLoRA (ΔW=AGB\Delta W=AGB with unrestricted GG).
    • Adapters: Insert a small MLP in parallel, yielding ϕ(W):xWxW+h(xA)B\phi(W): xW \mapsto xW + h(xA)B.

The unified training objective is to produce an adapted weight ϕ(W)\phi(W) whose span projects maximally onto the unknown optimal WW^*:

minϕWϕ(W)F\min_\phi \| W^* - \phi(W) \|_F

(Si et al., 2024).

3. Key Parameter-Efficient Finetuning Methods

PEFT strategies can be grouped as follows:

Method Adaptation Mechanism Typical Parameter Overhead (%) Workspace
BitFit Bias-only tuning 0.01–0.1 Biases (all layers)
(IA)3^3, SSL, SSB Diagonal scaling (row/col) 0.01–0.1 Weights
LoRA, AdaLoRA, FLoRA Low-rank update (ABAB, ADBADB) 0.1–1.0 Attention, FFN
Adapter (Houlsby, Pfeiffer) Bottleneck MLP insertion 0.1–4.0 After attention/FFN
Prefix/Prompt tuning Learnable embeddings 0.05–0.5 Attention keys/values
SVD-based methods Modifying singular values 0.01–0.5 Weights

(Si et al., 2024)

Novel classes: SSL (scale-left subspace: row scaling only, nn scalars per layer), SSB (scale-both subspace: input and output scaling, n+mn+m scalars), both offer extreme sparseness and reach up to 85–99% of full fine-tuning performance while tuning under 0.1% of parameters (Si et al., 2024).

4. Comparative Theoretical Analysis

Empirical and theoretical insights indicate that not just low-rankness, but the form of decomposition and implicit pattern constraints, critically influence the learning capability of PEFT schemes:

  • Expressivity Hierarchy: Adapters using ΔW=AGB\Delta W = AGB (unconstrained GG) outperform diagonal-constrained forms (AdaLoRA: GG diagonal, LoRA: G=IG=I). Fewer coupling/orthogonality constraints allow AA and BB to adapt more freely to WW^*, improving both optimization and convergence (Si et al., 2024).
  • Matrix Pattern Constraints (MPC): Adding explicit regularization (e.g., enforce (semi-)orthogonality or diagonal structure) can improve both performance and stability for low-rank PEFT methods, acting as a plug-and-play regularizer without extra parameters:
    • MPCo_o: Ro(A,B)=ATAIrF2+BBTIrF2R_o(A,B) = \|A^TA - I_r\|_F^2 + \|BB^T - I_r\|_F^2
    • MPCd_d: Rd(A,B)=ATAIrF2+BBTdiag(BBT)F2R_d(A,B) = \|A^TA - I_r\|_F^2 + \|BB^T - \text{diag}(BB^T)\|_F^2
    • Adding these as regularizers yields up to 1–2 average GLUE points of improvement for LoRA-family methods (Si et al., 2024).

5. Empirical Results and Practical Trade-offs

Extensive experiments were conducted on foundation models such as RoBERTa-base/large and DeBERTaV3-base on the GLUE benchmark (nine tasks). Key findings:

  • SSL achieves 82–85% of full fine-tuning with just 0.02% of model parameters.
  • SSB bridges the performance gap to full fine-tuning (≈99% performance) at <0.1% parameter overhead.
  • FLoRA and unconstrained low-rank architectures outperform AdaLoRA and LoRA, confirming the importance of avoiding unnecessary matrix constraints.
  • MPC regularization delivers consistent 0.3–1.0 point improvements (GLUE average) and reduces variance for the LoRA family, without introducing new trainable parameters.

These empirical results directly inform several practical implications:

  • Subspace scaling approaches (SSL/SSB) are optimal in memory-constrained environments.
  • Matrix decomposition forms with relaxed constraints (e.g., FLoRA) facilitate faster convergence and higher final accuracy.
  • MPC regularization modules are orthogonal enhancements, compatible with all low-rank PEFT methods (Si et al., 2024).

6. Design Space and Strategy Patterns

Recent work has highlighted that the design space of PEFT encompasses four key axes: layer grouping, parameter allocation, tunable group selection, and strategy assignment. Greedy exploration reveals the following effective patterns in practice:

  • Spindle grouping: Finer granularity at input/output layers, wider groups in the middle.
  • Uniform allocation: Trainable parameter budgets uniformly distributed across groups.
  • Tune all groups: Consistent tuning across all layer groups yields superior performance.
  • Strategy assignment: Assigning adapters, prefix, BitFit, or LoRA in a group-wise manner—rather than uniformly—enables performance gains surpassing full fine-tuning in some regimes (Chen et al., 2023).

PEFT strategies are thus not monolithic; optimal configurations depend on the foundation model’s structure, the downstream task, and resource constraints.

7. Future Directions and Open Challenges

Future research directions include:

  • Unified Decomposition Theory: Further theoretical examination of the correspondence between different PEFT forms under the lens of matrix decomposition, exploring the optimal subspace expansions for various architectures and tasks.
  • Automated Pattern Selection: Learning to adapt not only the parameter delta but also decomposition strategy, matrix pattern constraints, and module positioning (potentially through meta-learning or differentiable search).
  • Ultra-lightweight PEFT: Continued exploration of methods that drive parameter overheads toward negligible fractions—without loss in accuracy—by leveraging advanced subspace decompositions and task-aligned regularization.
  • Generalization Across Domains: Characterization of transferability and adaptation gaps for PEFT modules across tasks or model families, particularly how decomposition-based methods interact with architectures and pretraining biases.

The decomposition-centric view of PEFT introduced in (Si et al., 2024) provides not only a unified classification and analysis scheme but also inspires novel approaches—such as SSL and SSB—that push the frontier of parameter efficiency and adaptation capacity in large-scale foundation models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Parameter-Efficient Finetuning Techniques.