Parameter-Efficient Finetuning (PEFT)

Updated 19 March 2026

Parameter-Efficient Finetuning (PEFT) is a method that adapts large pre-trained models by updating only a small fraction of parameters through lightweight modules like adapters or low-rank matrices.
It reduces compute, memory, and storage costs by freezing most of the model and optimizing over a low-dimensional subspace tailored to task-specific improvements.
Empirical results show that PEFT methods achieve near full fine-tuning accuracy with significantly fewer parameters across diverse domains including NLP, vision, and multimodal tasks.

Parameter-Efficient Finetuning (PEFT) refers to a diverse set of strategies for adapting large pre-trained models to downstream tasks by introducing or updating only a fraction of the model’s total parameters. The typical approach is to freeze the majority of the model's weights and optimize task-specific lightweight structures—such as adapters, low-rank matrices, or sparse masks—enabling substantial reductions in compute, memory, and storage costs, without compromising downstream task performance. PEFT methods now comprise a fundamental adaptation paradigm across LLMs, vision transformers, multimodal models, and sequence models, achieving near parity with full parameter fine-tuning in most scenarios while expanding the accessibility and scalability of large-scale deep learning.

1. Definition, Formalism, and Motivation

The defining feature of PEFT is the restriction of learnable parameters to a low-dimensional subspace or explicit subset. For a pre-trained model with parameters $\theta\in\mathbb{R}^p$ and a set of trainable parameters $\Delta\theta\in S$ (with $\dim(S)\ll p$ ), PEFT fine-tuning seeks

$\min_{\Delta\theta\in S}\ \mathbb{E}_{(x,y)}\;\mathcal{L}(f(x;\theta+\Delta\theta),\,y)$

with $\theta$ typically frozen and $\Delta\theta$ constrained to a structured, often low-rank or sparse, subspace. The parameter-overhead is formalized as $\alpha = \frac{P_{\text{train}}}{P_{\text{total}}}\times 100\%$ , and high-performing methods achieve $\alpha$ well below 1% in state-of-the-art models such as GPT-3 (Zhang et al., 23 Jan 2025, Prottasha et al., 19 Apr 2025).

Motivation: The drivers behind PEFT are threefold: reduction of hardware and time resources in training/serving large models, support for multi-task or federated deployment with minimal redundancy, and improved sample efficiency and generalization, particularly in low-data settings (Balne et al., 2024, Prottasha et al., 19 Apr 2025). Key advantages include fast convergence due to the low-dimensional subspace, drastically lower inference and training memory, plug-and-play modularity for downstream tasks, and the ability to scale adaptation across a multitude of user or task-specific profiles (Kwak et al., 2024).

2. Taxonomy of PEFT Approaches

PEFT design can be categorized into several principal families (Zhang et al., 23 Jan 2025, Prottasha et al., 19 Apr 2025):

Family	Mechanism	Typical Budget
Selective	Tune subsets of existing params (e.g., bias or layernorm, or masks)	≪1%
Additive	Insert adapters (bottleneck MLPs), stacking small learnable modules	~0.1–5%
Prompt-based	Learnable soft prompts or key/value tokens inserted into attention	0.01–5%
Reparameterized	Low-rank (LoRA, DoRA, SSB/SSL) or matrix product operator updatess	0.02–1%
Sparse/Hybrid	Structured mask or combinations of the above in a unified pipeline	<0.5–10%

Selective: Only a fraction of parameters (e.g., biases—BitFit, layer norm scales, or a data-driven learned mask) are adapted. Classic methods yield $\alpha$ between 0.01–0.1% (Liao et al., 2023, Zhang et al., 23 Jan 2025).

Additive: Serial or parallel adapters, typically two-layer MLPs with bottleneck dimension $r$ , are inserted into specific sublayers and trained, as in Houlsby et al. and AdaptFormer. These often deliver $\Delta\theta\in S$ 01–4% parameter budgets (Balne et al., 2024, Prottasha et al., 19 Apr 2025).

Prompt-Based: Learnable continuous tokens (“soft prompts”) prepended at the input or to attention module key/value stacks (prefix tuning), requiring parameter counts proportional to prompt length and hidden size (Balne et al., 2024).

Reparameterized: Imposes explicit low-rank structure on weight updates, e.g., LoRA-style $\Delta\theta\in S$ 1 or more flexible decompositions such as AdaLoRA or FLoRA, with further regularization (“MPC”) possible (Si et al., 2024, Zhang et al., 23 Jan 2025).

Sparse/Hybrid: Approaches such as Diff-Pruning, PaFi-HiWi (Liao et al., 2023), X-PEFT (Kwak et al., 2024), and AdaPEFT (Xu et al., 18 May 2025) leverage mask-driven adaptation or optimize over combinations of PEFT modules per target budget.

3. Core Methodologies and Theoretical Underpinnings

Adapter Modules

Adapters are small bottleneck MLPs, typically inserted after attention and/or feed-forward sublayers. With input $\Delta\theta\in S$ 2, adapters apply

$\Delta\theta\in S$ 3

with $\Delta\theta\in S$ 4, $\Delta\theta\in S$ 5, and $\Delta\theta\in S$ 6. Only these matrices are trained; the base model is frozen (Balne et al., 2024, Prottasha et al., 19 Apr 2025).

Low-Rank Adaptation (LoRA)

Instead of fully adapting $\Delta\theta\in S$ 7, LoRA learns: $\Delta\theta\in S$ 8 for small $\Delta\theta\in S$ 9. LoRA is typically applied to projections within attention or MLP sublayers, and parameter cost is $\dim(S)\ll p$ 0 per adapted matrix (He, 2024, Zhang et al., 23 Jan 2025).

Masked and Sparse Tuning

Selective approaches include BitFit (bias-only), sparse-masked updates using fixed or magnitude/pruned masks (PaFi), and methods such as X-PEFT which optimize binary or real-valued masks to select among a bank of adapters for extreme multi-profile deployment (Liao et al., 2023, Kwak et al., 2024).

Unified Decomposition Theory

Recent analysis (Si et al., 2024) frames all PEFT methods as subspace manipulation of each weight: either “reconstruction” (e.g., scaling via $\dim(S)\ll p$ 1) or “extension” (addition of a low-rank $\dim(S)\ll p$ 2), or both. The precise factorization, constraints (e.g., LoRA’s $\dim(S)\ll p$ 3 in $\dim(S)\ll p$ 4), and regularization (orthogonality, nonlinearity—MPC) affect both expressivity and trainability.

4. Parameter Efficiency, Computational Complexity, and Performance

PEFT methods achieve dramatic parameter and memory reductions:

Model	Full FT	LoRA	Adapter	RED (scaling/bias)
RoBERTa-base	125 M	0.3 M	0.4 M	0.02 M
GPT-2-medium	355 M	0.8 M	0.9 M	0.05 M
LLaMA-2-7B	6739 M	8.39 M	—	0.26 M

(Wu et al., 2024) Benchmarks reveal that with $\dim(S)\ll p$ 5– $\dim(S)\ll p$ 6 fewer parameters, PEFT methods can match full-fine-tuning within 0.2–0.6 accuracy points on GLUE, and even slightly surpass full fine-tuning in some instruction-tuning and human-preference settings. Similar trends hold across computer vision (e.g., AdaptFormer, PointGST (Liang et al., 2024)) and generative models (Prottasha et al., 19 Apr 2025).

Empirical ablations further show:

Adapter/LoRA performance scales monotonically with bottleneck/rank, though with diminishing returns above $\dim(S)\ll p$ 7–512 (He, 2024).
Prefix/prompt tuning's effectiveness often saturates quickly; prompt lengths must be sufficient to convey task nuances but higher values sometimes underperform deep-layer adaptation (Balne et al., 2024).
RED—representation editing by scaling and bias—yields order-of-magnitude greater parameter efficiency than LoRA/adapters, leveraging the geometry of feature manifolds with affine edits post-FFN to robustly steer representations (Wu et al., 2024).

5. Hyperparameter Sensitivity and Training Strategies

PEFT methods, particularly LoRA and adapters, are sensitive to hyperparameters:

Optimal learning rates cluster around $\dim(S)\ll p$ 8, but need to be lower for higher ranks/sizes to avoid instability (He, 2024).
In LoRA/adapter, rank/bottleneck dimension $\dim(S)\ll p$ 9 should be increased until validation performance plateaus, but high values exacerbate instability in small-data regimes.
RED and SSB (scaling-subspace-both) minimize hyperparameters to only layer selection and a global learning rate, further mitigating tuning cost (Wu et al., 2024, Si et al., 2024).
Light-PEFT (Gu et al., 2024) introduces early-stage pruning of both base model and PEFT modules, preserving plug-and-play modularity while compressing to ∼50–75% of full parameter count with minimal accuracy loss.

Training best practices recommend:

Freezing normalization layers; only updating task-specific modules and heads.
Warmup schedules, 3–5% of total steps.
For multi-task or federated PEFT, use methods with task-agnostic masking or selection (PaFi, AdaPEFT (Xu et al., 18 May 2025)).

6. Domain-Specific Architectures and Extensions

PEFT extends beyond standard LLMs and ViTs:

Mixture-of-Experts (MoE) Models: PERFT introduces PEFT modules with custom routers over task-adaptive experts, outperforming MoE-agnostic variants while tuning <1% of parameters (Liu et al., 2024).
Sequence and SSM Models: For Mamba, PEFT adapts LoRA/adapters on SSM projections or introduces methods such as affix-tuning and additional-scan, outperforming Transformer baselines and illustrating architecture-specific adaptation (Yoshimura et al., 2024).
Vision/Segmentation: Cross-block orchestration and spectral adapters (e.g., PointGST) yield significant gains for structure-intensive tasks such as segmentation and point cloud analysis, with parameter counts <1% of full fine-tuning (Peng et al., 2023, Liang et al., 2024).
Extreme Multi-Profile: X-PEFT amortizes adapter banks across thousands of profiles, storing only two bitmasks per profile—enabling a 10,000× reduction in memory (Kwak et al., 2024).

7. Applications, Empirical Results, and Open Challenges

PEFT has realized state-of-the-art or near full-fine-tuning results across domains: LLMs (classification, QA, instruction-following, code), vision (classification, segmentation), protein modeling, speech synthesis, and seismic inverse problems (Balne et al., 2024, Ghosal et al., 2024).

Key empirical insights:

For reasoning-intensive tasks, LoRA, adapters, and advanced subspace-tuning methods (SSB, FLoRA) reach 98–100% of full-tune accuracy with <1% parameters (Si et al., 2024).
Knowledge-intensive adaptation (e.g., open-book QA) sometimes requires "large-capacity" adapters, motivating memory-efficient exo-GPU methods such as MEFT (Hao et al., 2024).
In low-resource translation, bottleneck adapters (Houlsby, Pfeiffer) often outperform full-fine-tune and other PEFT methods (Su et al., 2024).
PEFT methods can degrade high-resource language performance while improving low-resource accuracy in multilingual LLMs, indicating task- and domain-sensitivity (Aggarwal et al., 2024).

Open challenges and directions:

Reducing hyperparameter-tuning overhead and improving auto-selection of parameter subsets (meta-PEFT, AdaPEFT) (Xu et al., 18 May 2025).
Theoretical understanding of low-rank adaptation’s success: studies suggest subspace selection and minimal affine perturbation preserve pre-trained structure while allowing rapid task transfer (Wu et al., 2024, Si et al., 2024).
Robust PEFT in the face of domain shift, continual learning, and privacy constraints (Zhang et al., 23 Jan 2025, Prottasha et al., 19 Apr 2025).
Interpretability of adaptation modules and their internal mappings (Zhang et al., 23 Jan 2025).

PEFT continues to evolve as a foundational technique for sustainable and scalable adaptation of neural foundation models across tasks and modalities. Its ongoing development is marked by the interplay between advances in subspace theory, architecture-specific design, task-driven pruning, and new forms of task or domain conditioning. Performance gains, extreme efficiency, and flexible modularity underpin PEFT’s growing impact across research and real-world deployment (Prottasha et al., 19 Apr 2025, Wu et al., 2024, Si et al., 2024).