Parameter-Efficient Fine-tuning
- Parameter-efficient fine-tuning is a set of methods that adapts large pre-trained models by updating only a small subset of parameters, enhancing efficiency and scalability.
- These techniques employ low-rank reparameterization, adapter insertion, and selective parameter updates to significantly cut computational and storage costs while preserving performance.
- Recent empirical results across NLP, vision, and scientific domains demonstrate that PEFT can achieve near full fine-tuning accuracy with dramatically fewer trainable parameters.
Parameter-efficient fine-tuning (PEFT) is a family of techniques that adapt large pre-trained models to downstream tasks while training only a small subset of parameters, leaving the majority of the model weights unchanged. By decoupling task-specific adaptation from the bulk of the pretrained backbone, PEFT achieves comparable or superior performance to full fine-tuning with dramatic reductions in trainable parameters, storage, and computational overhead. Recent developments span mechanisms based on low-rank reparameterizations, frequency-domain adaptation, structural module insertion, and sophisticated parameter selection schemes, enabling broad applicability across language, vision, medical imaging, and scientific domains.
1. Conceptual Foundations and Motivations
The motivation for PEFT arises from the prohibitive cost and redundancy of full fine-tuning in models that may have hundreds of millions or billions of parameters. Full fine-tuning not only results in task-specific checkpoints of size per task (where is the full parameter count) but also strains storage, communication, and on-device inference resources, especially in scenarios with limited hardware or in federated settings (Balne et al., 2024, Zhang et al., 23 Jan 2025). PEFT addresses these challenges by updating only a targeted subset of parameters or lightweight task-specific modules, typically 1% of the model, thus enabling:
- Storage efficiency: only the adapter weights or sparse parameter deltas need to be saved per task.
- Rapid adaptation: much lower gradient memory and per-step compute cost during training.
- Improved generalization and robustness, as parameter sparsity often regularizes the adaptation and stabilizes fine-tuning dynamics (Fu et al., 2022).
2. Methodological Taxonomy and Design Principles
PEFT methods can be systematically classified by the mechanism used to inject task-specific capacity:
- Selection-based/sparse tuning: Only a carefully chosen subset of the existing weights or biases are updated, with all other parameters frozen. Classic examples include BitFit (bias-only), LayerNorm-only tuning, and gradient- or Fisher-informed parameter masks (ValizadehAslani et al., 2024, Fu et al., 2022, Liao et al., 2023).
- Insertion-based approaches: Lightweight neural modules (adapters) are inserted at each layer or sub-layer of the model. These typically follow a bottleneck architecture: down-projection, nonlinearity, up-projection, and a residual addition (Chen et al., 2023, Baker et al., 1 Jun 2026).
- Reparameterization-based approaches: Model weight updates are parameterized as constrained transformations—typically low-rank matrices (LoRA, AdaLoRA, DoRA, PiCa, FLoRA, SVDiff), frequency-domain coefficients (sDCTFT, FourierFT, CDVFT), or structured matrix factorizations (e.g., column/row projections, circulant-diagonal products) (Shen et al., 2024, Hwang et al., 26 May 2025, Hwang et al., 26 May 2025, Hwang et al., 26 May 2025, Hwang et al., 26 May 2025).
- Prompt/prefix tuning: Learnable input or intermediate sequence tokens are optimized (soft prompts), which control the network computation without altering model weights directly (Balne et al., 2024, Zhang et al., 23 Jan 2025).
- Hybrid or automated approaches: PEFT modules are assigned adaptively across layer groups, sometimes with meta-learned structure, e.g., S⁴ designs that greedily optimize layer grouping, allocation, and plug-in strategy (Chen et al., 2023, Zhang et al., 23 Jan 2025).
3. Core Algorithms and Mathematical Formulations
Low-Rank Adaptation (LoRA)
LoRA freezes the original weight and introduces a low-rank matrix update where and , . Only , are trainable. This reduces adaptation complexity from 0 to 1 per layer (Chen et al., 2023, Baker et al., 1 Jun 2026).
Frequency-Domain Fine-Tuning
Recent work moves PEFT to the frequency domain, exploiting the energy compaction and decorrelation of the Discrete Cosine Transform (DCT). Selective DCT Fine-Tuning (sDCTFT) projects the LoRA-style weight change into DCT space, partitions frequency bands, and selects only high-energy, information-rich coefficients to update (Shen et al., 2024). The reverse DCT reconstructs the dense delta at each pass.
Sparse Parameter Selection
Selection-based PEFT includes:
- Magnitude-based: update the parameters with smallest or largest magnitudes in the pretrained weight vector (PaFi) (Liao et al., 2023).
- Fisher/gradient-based: use empirical Fisher information or gradient statistics to select parameters that most impact the downstream loss (FISH Mask, SAM, FPS, IRD) (Fu et al., 2022, ValizadehAslani et al., 2024, Yang et al., 31 Oct 2025, Dong et al., 2024).
Adapter Architectures
Adapters follow the form: 2 with 3 (down-projection to bottleneck), 4 (up-projection), 5 (Chen et al., 2023, Baker et al., 1 Jun 2026).
Representation Editing
Instead of tuning weights, RED ("Representation Editing") modifies hidden representations: 6 where only 7 are trained per layer (Wu et al., 2024).
4. Empirical Performance and Efficiency
PEFT methods consistently deliver competitive results across NLP, vision, protein modeling, and scientific tasks:
- On GLUE (RoBERTa, T5, BERT), LoRA and sDCTFT with 80.03–0.05M parameters match or surpass full fine-tuning with 125M+ parameters; sDCTFT achieves a 97600 reduction in parameter count vs. LoRA on LLaMA3.1-8B (Shen et al., 2024, Chen et al., 2023, ValizadehAslani et al., 2024).
- In instance segmentation, LoRA and adapters enable tuning only 1–6% of the model to achieve 195–98% of full-tuning AP, with LoRA excelling in low distribution shift settings, while adapters offer higher capacity for complex, structured domain shifts (Baker et al., 1 Jun 2026).
- On low-resource machine translation (mBART-50), Houlsby+Inversion adapters and mix-and-match PEFT variants yield 210–40% BLEU gains over full-tune baselines, with just 4–9% overhead (Su et al., 2024).
- In seismic full-waveform inversion, LoRA-PEFT reduces per-task adaptation cost by 3 while matching full fine-tuning and increasing OOD robustness (Ghosal et al., 2024).
- For time series foundation models, TRACE introduces Gated DSIC masking and low-rank heads to reach or exceed full-fine-tune with <3% parameters (Li et al., 21 Mar 2025).
- Across 24 image-classification and transfer tasks, FPS achieves mean accuracies within 4 points of state-of-the-art, with 5 less peak memory and 6 lower selection latency than gradient-based selection (Yang et al., 31 Oct 2025).
- Data-driven selection methods (IRD) outperform random sampling in identifying which parameters should be tuned for a given sample distribution, optimizing GLUE performance under strong sparsity (Dong et al., 2024).
5. Theoretical Insights and Design Patterns
Parameter sparsity has a regularizing effect on stability and generalization in fine-tuning; by freezing the majority of weights, PEFT reduces the output sensitivity to data perturbation, resulting in lower variance and sometimes better generalization than dense fine-tuning (Fu et al., 2022). Analytical results link PEFT selection to implicit quadratic regularization on non-updated parameters and establish that optimal parameter selection is NP-hard, motivating gradient- or second-order-based heuristics such as SAM (Fu et al., 2022).
Automated PEFT design spaces reveal robust patterns: "spindle" layer grouping (fewer adapters at input/output, more in the middle), uniform parameter allocation per group, all-groups-tuned, and group-specific strategy assignment yield superior multi-task and cross-backbone transfer (Chen et al., 2023).
Decomposition-centric analysis shows all PEFT can be reframed as subspace modification or extension via low-rank or structured basis adaptation, with extension methods (e.g., FLoRA with unconstrained intermediate transformations) empirically outperforming constrained factorizations (LoRA, AdaLoRA) due to fewer coupling constraints (Si et al., 2024).
6. Limitations, Use-Case Specificity, and Practical Recommendations
PEFT strategy effectiveness is context-dependent:
- LoRA and related decompositions are efficient and performant in moderate domain-shift, low-latency settings (on-device and scalable serving), but may underfit under large data or highly nonstationary domains.
- Adapter bottlenecks or multi-head hybrid PEFT strategies excel where task adaptation requires nonlinear capacity or significant feature transformation (medical imaging, scientific data, highly structured vision tasks) (Baker et al., 1 Jun 2026, Balne et al., 2024).
- Frequency-domain adaptations (sDCTFT, CDVFT) exploit gradient sparsity and spatial-frequency structure for dramatic compression and are well-suited to vision and large-scale LLMs with spectral compression properties (Shen et al., 2024, Hwang et al., 26 May 2025).
- Selection-based methods (PaFi, LayerNorm-only, BitFit) reach near full-fine-tune performance in regimes with strong over-parameterization, especially where only a minimal steer is needed (ValizadehAslani et al., 2024, Liao et al., 2023).
Best practices include always tuning LayerNorm parameters in transformers, using group-wise or Fisher-ranked parameter selection, validating adapter size or LoRA rank on held-out data, and considering hybrid or automated design-space methods for heterogeneous or multitask adaptation (Zhang et al., 23 Jan 2025, Chen et al., 2023, ValizadehAslani et al., 2024).
7. Future Directions and Open Problems
Future research in PEFT is poised to address open challenges:
- Unified cross-modal benchmarking for systematic PEFT assessment (Zhang et al., 23 Jan 2025, Balne et al., 2024).
- Theoretical investigation of the scaling laws, expressivity, and generalization bounds of low-rank and subspace-adaptive methods (Si et al., 2024).
- Automated and adaptive module selection and resource allocation via meta-learning or neural architecture search (Chen et al., 2023, Zhang et al., 23 Jan 2025).
- Extending PEFT to streaming, continual learning, privacy-preserving, and federated contexts, including task-agnostic and modular plug-and-play adapters (Liao et al., 2023, Balne et al., 2024).
- Spectral and frequency-domain methods offer routes to even greater compression; interpretability and explainability of PEFT adaptation pathways remain essential in scientific and high-stakes domains (Shen et al., 2024, Hwang et al., 26 May 2025).
Parameter-efficient fine-tuning remains a critical enabler of scalable, sustainable, and versatile transfer learning in state-of-the-art foundation models, continually advancing in sophistication and breadth of application (Zhang et al., 23 Jan 2025, Balne et al., 2024, Shen et al., 2024, Chen et al., 2023, Fu et al., 2022).