Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

80 tokens/sec

GPT-4o

59 tokens/sec

Gemini 2.5 Pro Pro

43 tokens/sec

o3 Pro

7 tokens/sec

GPT-4.1 Pro

50 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

Parameter-Efficient Fine-Tuning (PEFT)

Updated 24 June 2025

Parameter-Efficient Fine-Tuning (PEFT) denotes a family of techniques designed to adapt large pre-trained models, such as LLMs and vision-LLMs, to downstream tasks by updating only a small fraction of their parameters. PEFT is motivated by the computational, storage, and practical constraints imposed by fully fine-tuning ever-larger models, and is structured around specific design patterns that trade off adaptation quality, efficiency, and inference characteristics (Pu et al., 2023 , Zhang et al., 23 Jan 2025 , Prottasha et al., 19 Apr 2025 ). The following article summarizes key PEFT categories, foundational mechanisms, comparative properties, domain-specific advancements, and ongoing challenges, as established in recent empirical and survey literature.

1. Categories and Mechanisms of PEFT

PEFT approaches are grouped into five principal classes: additive, selective, reparameterized, hybrid, and unified (MoE-based) strategies (Zhang et al., 23 Jan 2025 , Prottasha et al., 19 Apr 2025 ).

Additive Methods:

These inject new, lightweight, trainable modules into a frozen pre-trained backbone. The canonical example is the adapter, typically implemented as a bottleneck MLP: $h' = h + W_{\text{up}} \sigma(W_{\text{down}} h)$ where $W_{\text{down}}$ projects to a lower rank and $W_{\text{up}}$ restores dimensionality. Variants include serial adapters (inserted sequentially), parallel adapters (outputs computed alongside the main layer), and multi-adapter fusions for multi-task settings. Additive methods permit modular deployment and low per-task parameter cost (1–5% model size).

Selective Fine-Tuning:

Selective PEFT updates only predetermined or dynamically chosen parameter subsets. Strategies include:

BitFit: Tune only bias terms.
Automatic Masking: Use importance metrics (e.g., gradient norm, Fisher information) to select parameters. The set of fine-tuned parameters is: $\theta_s = \{ \theta_i \mid C(\theta_i) \geq \tau \}$ Selective methods can reach ultra-low parameter shares ( $<0.1\%$ ), but require careful selection logic.

Reparameterized Methods:

Reparameterization PEFT replaces standard parameter updates with low-rank or matrix-factorized updates: $\Delta W \approx AB, \quad (A \in \mathbb{R}^{d \times r}, B \in \mathbb{R}^{r \times d})$ For example, LoRA modifies model weights as $W' = W + BA$ . Variants such as adaptive-rank LoRA (AdaLoRA, DyLoRA) allow further parameter savings, and QLoRA combines quantization with low-rank adaptation for additional efficiency.

Hybrid PEFT:

Hybrid strategies integrate several PEFT modules—e.g., adapters, LoRA, prompt tuning—within the same model. Methods like UniPELT learn mixture weights $\alpha_i$ for each module: $\theta_{\text{hybrid}} = \sum_{i=1}^n \alpha_i \theta_i$ These offer greater flexibility, robustness, and dynamic per-task adjustment.

Unified/MoE-based Methods:

Unified frameworks employ a Mixture-of-Experts (MoE) approach, where a gating mechanism selects among several PEFT modules or experts for each input or task: $\Delta W = \sum_i \alpha_i A_i B_i$ This specialization allows high multi-task capacity and scalable sharing.

2. Resource Implications and Empirical Trade-Offs

PEFT methods deliver pronounced resource efficiency compared to full fine-tuning. The typical trainable parameter fractions are (Prottasha et al., 19 Apr 2025 ):

Additive/Reparameterized: $0.1$– $5\%$ (often $\ll 1\%$ )
Selective (e.g., BitFit): $<1\%$
Prompt tuning: $0.01$– $4\%$ (proportional to prompt length and embedding size)
Hybrid/Unified: Varies but remains substantially smaller than full FT

Performance often closely approaches (or can sometimes surpass) that of full fine-tuning for well-matched downstream tasks, especially in large models or with moderate data regime (Pu et al., 2023 , Zhang et al., 23 Jan 2025 ). PEFT also enables:

Faster training and lower VRAM use, particularly with adapters and LoRA
Modular multi-task deployment, with per-task PEFT modules being hot-swappable
Improved generalization and robustness to overfitting and catastrophic forgetting in many empirical studies

A representative table summarizes empirical findings in LLMs (Pu et al., 2023 ):

Method	Parameters Modified	Efficiency	Performance (Small Data)	Performance (Large Data)	Notes
LoRA	Attention weights	Moderate	High	High	Flexible submodule selection
$(\mathrm{IA})^3$	Attention/dense scale	Best	Moderate–High	High	Smallest parameter/memory overhead
BitFit	Only biases	High	High	Medium	Simplicity; best for classification
Prompt Tuning	Input embeddings	Best	Low	Medium	Low performance on small data tasks

3. Practical Implementation and Application Domains

PEFT techniques have been successfully applied across domains (Zhang et al., 23 Jan 2025 , Balne et al., 21 Apr 2024 , Prottasha et al., 19 Apr 2025 ):

LLMs: Adapters and LoRA dominate for LLM fine-tuning, instruction tuning, and multilingual adaptation. QLoRA is widely used with quantized models for lower hardware use.
Vision & Multimodal: ViT-based models use adapters (AdaptFormer, Convpass), VPT (Visual Prompt Tuning), and reparameterized strategies; diffusion/generative models employ LoRA and adapter-based tuning for task transfer and concept fusion.
Low-Resource & Multitask Settings: PEFT supports edge deployment, federated adaptation (e.g., via sparse, task-agnostic masks (Liao et al., 2023 )), and scalable profile-based adaptation (Kwak et al., 29 Jan 2024 ).
Other Domains: Successes are noted in medical imaging, protein modeling, 3D point/graph representation (e.g., PointGST (Liang et al., 10 Oct 2024 )), and geospatial foundation models (Marti-Escofet et al., 24 Apr 2025 ).

4. Advanced Methodological Trends and Optimizations

Recent work introduces advanced PEFT search and optimization strategies to address the vast design space of PEFT module types, placements, and hyperparameters:

Automated Search Frameworks: BIPEFT (Chang et al., 4 Oct 2024 ) and PrunePEFT (Yu et al., 9 Jun 2025 ) iteratively search the PEFT design/configuration space, balancing parameter budgets, adaptation targets, and search efficiency.
Decomposition & Subspace Theories: Rigorous mathematical frameworks unify the landscape under subspace projection and matrix decomposition (Si et al., 7 Jul 2024 ), showing that performance depends on how PEFT alters the model’s singular vectors, not just parameter count.
Latency/Storage-Efficient Variants: Approaches like PaFi/HiWi (Liao et al., 2023 ) eliminate extra inference latency by merging adaptation directly into model weights via data-less, task-agnostic masking or weight updates.
Cross-Block and Domain-Specific Innovations: Mechanisms such as cross-block orchestration (Peng et al., 2023 ) or spectral-domain adapters for 3D/point cloud data (Liang et al., 10 Oct 2024 ) address unique domain and modality challenges.

5. Impact, Limitations, and Evaluation

Extensive benchmarks demonstrate that state-of-the-art PEFT methods can deliver near-equal or, in selected OOD or low-resource settings, superior results versus fully fine-tuned models, with parameter savings of up to 100x or more (Pu et al., 2023 , Aggarwal et al., 15 Jan 2024 , Ghosal et al., 27 Dec 2024 ). For instance, on the FLAN-T5 LLM (Pu et al., 2023 ), tuning under 1% of parameters was sufficient for competitive accuracy on classification and generation, and selective tuning of subset layers could maintain or improve results with roughly half the typical PEFT parameter budget.

However, several caveats are identified:

Convergence Speed: PEFT methods generally converge slower than full fine-tuning, especially in low-data regimes.
Task & Data Dependence: Optimal PEFT choice depends on the downstream task, available training samples, and memory constraints.
Expressiveness Limits: Certain settings (complex reasoning, long-form generation) may still favor full fine-tuning or require further PEFT advances (He, 25 Nov 2024 ).

Relevant evaluation metrics include model accuracy, performance-per-parameter, run-time efficiency, and memory/storage usage: $\text{Perf/Param} = \frac{\text{Accuracy or ROUGE-L}}{\text{Number of Trained Parameters}},\ \Delta W = AB$ (for LoRA, $A$ and $B$ low-rank matrices).

6. Future Directions and Open Challenges

Open research questions include (Zhang et al., 23 Jan 2025 , Prottasha et al., 19 Apr 2025 ):

Scalability: Coping with ever-larger base models and multi-task deployments.
Interpretability: Understanding the functional role of PEFT modules and their adaptation mechanisms.
Unified Benchmarks: The need for standardized, cross-domain PEFT evaluation.
Federated, Privacy-Aware, and Continual Learning: Developing PEFTs that seamlessly support distributed and lifelong adaptation.
Theoretical Foundations & Scaling Laws: Clarifying when parameter savings saturate and how subspace adaptation yields generalization.

7. Summary Table: PEFT Types and Properties

Type	Key Mechanism/Formulation	Typical Params	Main Strengths
Additive	Insert adapters: $h' = h + h_{\text{adapter}}$	1-5%	Modular, task-specific, robust
Selective	Update selected $\theta_s$ (mask/importance-based)	<1%	Ultra-lightweight, profile/task-specific
Reparameterized	Low-rank update: $\Delta W = AB$	<1%	High efficiency, tunable granularity
Hybrid	Combine modules: $\theta_\text{hybrid} = \sum \alpha_i\theta_i$	Varies	Transferability, robustness
MoE/Unified	Expert mixture: $\Delta W = \sum \alpha_iA_iB_i$	Task/route-specific	Scalable, multi-task, modular

Extensive empirical and theoretical advances have cemented PEFT as a central technology for deploying and scaling foundation models. The mature landscape reflects a convergence around a modular, theoretically principled, and domain-aware approach to efficient model adaptation, while continuing to evolve in response to emerging scale, generalization, and deployment challenges.

PDF Markdown Bookmark Chat (Pro)