Papers
Topics
Authors
Recent
Search
2000 character limit reached

Parameter-Efficient Fine-tuning

Updated 2 June 2026
  • Parameter-efficient fine-tuning is a set of methods that adapts large pre-trained models by updating only a small subset of parameters, enhancing efficiency and scalability.
  • These techniques employ low-rank reparameterization, adapter insertion, and selective parameter updates to significantly cut computational and storage costs while preserving performance.
  • Recent empirical results across NLP, vision, and scientific domains demonstrate that PEFT can achieve near full fine-tuning accuracy with dramatically fewer trainable parameters.

Parameter-efficient fine-tuning (PEFT) is a family of techniques that adapt large pre-trained models to downstream tasks while training only a small subset of parameters, leaving the majority of the model weights unchanged. By decoupling task-specific adaptation from the bulk of the pretrained backbone, PEFT achieves comparable or superior performance to full fine-tuning with dramatic reductions in trainable parameters, storage, and computational overhead. Recent developments span mechanisms based on low-rank reparameterizations, frequency-domain adaptation, structural module insertion, and sophisticated parameter selection schemes, enabling broad applicability across language, vision, medical imaging, and scientific domains.

1. Conceptual Foundations and Motivations

The motivation for PEFT arises from the prohibitive cost and redundancy of full fine-tuning in models that may have hundreds of millions or billions of parameters. Full fine-tuning not only results in task-specific checkpoints of size O(P)\mathcal{O}(P) per task (where PP is the full parameter count) but also strains storage, communication, and on-device inference resources, especially in scenarios with limited hardware or in federated settings (Balne et al., 2024, Zhang et al., 23 Jan 2025). PEFT addresses these challenges by updating only a targeted subset of parameters or lightweight task-specific modules, typically \ll1% of the model, thus enabling:

  • Storage efficiency: only the adapter weights or sparse parameter deltas need to be saved per task.
  • Rapid adaptation: much lower gradient memory and per-step compute cost during training.
  • Improved generalization and robustness, as parameter sparsity often regularizes the adaptation and stabilizes fine-tuning dynamics (Fu et al., 2022).

2. Methodological Taxonomy and Design Principles

PEFT methods can be systematically classified by the mechanism used to inject task-specific capacity:

  1. Selection-based/sparse tuning: Only a carefully chosen subset of the existing weights or biases are updated, with all other parameters frozen. Classic examples include BitFit (bias-only), LayerNorm-only tuning, and gradient- or Fisher-informed parameter masks (ValizadehAslani et al., 2024, Fu et al., 2022, Liao et al., 2023).
  2. Insertion-based approaches: Lightweight neural modules (adapters) are inserted at each layer or sub-layer of the model. These typically follow a bottleneck architecture: down-projection, nonlinearity, up-projection, and a residual addition (Chen et al., 2023, Baker et al., 1 Jun 2026).
  3. Reparameterization-based approaches: Model weight updates are parameterized as constrained transformations—typically low-rank matrices (LoRA, AdaLoRA, DoRA, PiCa, FLoRA, SVDiff), frequency-domain coefficients (sDCTFT, FourierFT, CDVFT), or structured matrix factorizations (e.g., column/row projections, circulant-diagonal products) (Shen et al., 2024, Hwang et al., 26 May 2025, Hwang et al., 26 May 2025, Hwang et al., 26 May 2025, Hwang et al., 26 May 2025).
  4. Prompt/prefix tuning: Learnable input or intermediate sequence tokens are optimized (soft prompts), which control the network computation without altering model weights directly (Balne et al., 2024, Zhang et al., 23 Jan 2025).
  5. Hybrid or automated approaches: PEFT modules are assigned adaptively across layer groups, sometimes with meta-learned structure, e.g., S⁴ designs that greedily optimize layer grouping, allocation, and plug-in strategy (Chen et al., 2023, Zhang et al., 23 Jan 2025).

3. Core Algorithms and Mathematical Formulations

Low-Rank Adaptation (LoRA)

LoRA freezes the original weight W0W_0 and introduces a low-rank matrix update ΔW=BA\Delta W = BA where BRd×rB\in\mathbb{R}^{d\times r} and ARr×kA\in\mathbb{R}^{r\times k}, rmin(d,k)r \ll \min(d,k). Only AA, BB are trainable. This reduces adaptation complexity from PP0 to PP1 per layer (Chen et al., 2023, Baker et al., 1 Jun 2026).

Frequency-Domain Fine-Tuning

Recent work moves PEFT to the frequency domain, exploiting the energy compaction and decorrelation of the Discrete Cosine Transform (DCT). Selective DCT Fine-Tuning (sDCTFT) projects the LoRA-style weight change into DCT space, partitions frequency bands, and selects only high-energy, information-rich coefficients to update (Shen et al., 2024). The reverse DCT reconstructs the dense delta at each pass.

Sparse Parameter Selection

Selection-based PEFT includes:

Adapter Architectures

Adapters follow the form: PP2 with PP3 (down-projection to bottleneck), PP4 (up-projection), PP5 (Chen et al., 2023, Baker et al., 1 Jun 2026).

Representation Editing

Instead of tuning weights, RED ("Representation Editing") modifies hidden representations: PP6 where only PP7 are trained per layer (Wu et al., 2024).

4. Empirical Performance and Efficiency

PEFT methods consistently deliver competitive results across NLP, vision, protein modeling, and scientific tasks:

  • On GLUE (RoBERTa, T5, BERT), LoRA and sDCTFT with PP80.03–0.05M parameters match or surpass full fine-tuning with 125M+ parameters; sDCTFT achieves a PP9760\ll0 reduction in parameter count vs. LoRA on LLaMA3.1-8B (Shen et al., 2024, Chen et al., 2023, ValizadehAslani et al., 2024).
  • In instance segmentation, LoRA and adapters enable tuning only 1–6% of the model to achieve \ll195–98% of full-tuning AP, with LoRA excelling in low distribution shift settings, while adapters offer higher capacity for complex, structured domain shifts (Baker et al., 1 Jun 2026).
  • On low-resource machine translation (mBART-50), Houlsby+Inversion adapters and mix-and-match PEFT variants yield \ll210–40% BLEU gains over full-tune baselines, with just 4–9% overhead (Su et al., 2024).
  • In seismic full-waveform inversion, LoRA-PEFT reduces per-task adaptation cost by \ll3 while matching full fine-tuning and increasing OOD robustness (Ghosal et al., 2024).
  • For time series foundation models, TRACE introduces Gated DSIC masking and low-rank heads to reach or exceed full-fine-tune with <3% parameters (Li et al., 21 Mar 2025).
  • Across 24 image-classification and transfer tasks, FPS achieves mean accuracies within \ll4 points of state-of-the-art, with \ll5 less peak memory and \ll6 lower selection latency than gradient-based selection (Yang et al., 31 Oct 2025).
  • Data-driven selection methods (IRD) outperform random sampling in identifying which parameters should be tuned for a given sample distribution, optimizing GLUE performance under strong sparsity (Dong et al., 2024).

5. Theoretical Insights and Design Patterns

Parameter sparsity has a regularizing effect on stability and generalization in fine-tuning; by freezing the majority of weights, PEFT reduces the output sensitivity to data perturbation, resulting in lower variance and sometimes better generalization than dense fine-tuning (Fu et al., 2022). Analytical results link PEFT selection to implicit quadratic regularization on non-updated parameters and establish that optimal parameter selection is NP-hard, motivating gradient- or second-order-based heuristics such as SAM (Fu et al., 2022).

Automated PEFT design spaces reveal robust patterns: "spindle" layer grouping (fewer adapters at input/output, more in the middle), uniform parameter allocation per group, all-groups-tuned, and group-specific strategy assignment yield superior multi-task and cross-backbone transfer (Chen et al., 2023).

Decomposition-centric analysis shows all PEFT can be reframed as subspace modification or extension via low-rank or structured basis adaptation, with extension methods (e.g., FLoRA with unconstrained intermediate transformations) empirically outperforming constrained factorizations (LoRA, AdaLoRA) due to fewer coupling constraints (Si et al., 2024).

6. Limitations, Use-Case Specificity, and Practical Recommendations

PEFT strategy effectiveness is context-dependent:

  • LoRA and related decompositions are efficient and performant in moderate domain-shift, low-latency settings (on-device and scalable serving), but may underfit under large data or highly nonstationary domains.
  • Adapter bottlenecks or multi-head hybrid PEFT strategies excel where task adaptation requires nonlinear capacity or significant feature transformation (medical imaging, scientific data, highly structured vision tasks) (Baker et al., 1 Jun 2026, Balne et al., 2024).
  • Frequency-domain adaptations (sDCTFT, CDVFT) exploit gradient sparsity and spatial-frequency structure for dramatic compression and are well-suited to vision and large-scale LLMs with spectral compression properties (Shen et al., 2024, Hwang et al., 26 May 2025).
  • Selection-based methods (PaFi, LayerNorm-only, BitFit) reach near full-fine-tune performance in regimes with strong over-parameterization, especially where only a minimal steer is needed (ValizadehAslani et al., 2024, Liao et al., 2023).

Best practices include always tuning LayerNorm parameters in transformers, using group-wise or Fisher-ranked parameter selection, validating adapter size or LoRA rank on held-out data, and considering hybrid or automated design-space methods for heterogeneous or multitask adaptation (Zhang et al., 23 Jan 2025, Chen et al., 2023, ValizadehAslani et al., 2024).

7. Future Directions and Open Problems

Future research in PEFT is poised to address open challenges:

Parameter-efficient fine-tuning remains a critical enabler of scalable, sustainable, and versatile transfer learning in state-of-the-art foundation models, continually advancing in sophistication and breadth of application (Zhang et al., 23 Jan 2025, Balne et al., 2024, Shen et al., 2024, Chen et al., 2023, Fu et al., 2022).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Parameter-Efficient Fine-tuning.