Papers
Topics
Authors
Recent
2000 character limit reached

Finetuning-based Methods

Updated 9 January 2026
  • Finetuning-based methods are adaptation strategies that update pretrained model parameters to improve performance on novel tasks.
  • They encompass approaches like parameter-efficient tuning, block-wise updates, reinforcement learning, and meta-heuristic searches to balance efficiency and adaptation.
  • Empirical benchmarks and strategic parameter selection demonstrate that targeted finetuning enhances model robustness and generalization across diverse domains.

Finetuning-based Methods

Finetuning-based methods comprise a broad class of adaptation algorithms that adjust the parameters, representations, or outputs of pretrained models to improve their performance on new tasks, domains, or data distributions. Unlike training from scratch, finetuning leverages the representational power of large pretrained models—such as LLMs, vision-transformers, or foundation models—by applying targeted updates, typically using task-specific data and losses. The spectrum of finetuning strategies spans from full parameter updates to highly selective parameter-efficient interventions, and from pure gradient-based schemes to meta-heuristic and reinforcement learning approaches. Below, salient methods, theoretical principles, algorithmic workflows, practical considerations, and representative advances are systematically reviewed, with direct citations to arXiv research.

1. Fundamentals and Taxonomy of Finetuning

Finetuning refers to the post-pretraining adaptation of a model, usually with some form of risk minimization on new-task data. Standard finetuning is fully parametric: all weights are unfrozen and updated by minimizing a downstream loss via gradient descent. Major alternative paradigms have emerged:

  • Parameter-efficient finetuning (PEFT): Updates are restricted to small subspaces, such as adapters, low-rank parameterizations (LoRA), bias vectors (BitFit), or representation-level interventions (ReFT) (Wu et al., 2024).
  • Block-wise and localized finetuning: Organizes updates by selecting salient contiguous layer blocks or submodules, aiming to balance overfitting risk and adaptation capacity (Barakat et al., 2023, Yang et al., 26 Sep 2025).
  • Meta-heuristic and non-gradient search: Population-based or derivative-free search strategies (GA, PSO) explore the weight landscape in the vicinity of the pretrained optimum for improved fine-local solutions (Rosa et al., 2022).
  • Reinforcement learning–based finetuning: Applies RL-style objectives—policy gradients or reward regularization—particularly in alignment-sensitive domains such as RL, text generation, or structured control (Luo et al., 1 Jan 2026, Tirotta et al., 2021).

The design space includes further axes: which parameters to update, how to select training examples, and the exact loss and optimization objective.

2. Standard, Localized, and Block-wise Finetuning

Traditional finetuning applies a uniform update schedule for all model weights via minibatch stochastic gradient descent (SGD, Adam). However, this approach is often suboptimal for domain transfer (overfitting, catastrophic forgetting), data-scarce regimes (unstable gradients), or lifelong editing (poor scalability).

Localized and Block-wise Strategies

  • Block-wise Optimization: Only a subset of contiguous layers (block) is updated, with block selection driven by empirical performance (e.g., on a validation split). Candidate blocks can be constructed using natural network boundaries (e.g., pooling or activation layers), top-ranked layers by validation accuracy, or fixed-size sliding windows. Empirical selection is executed by cross-validation, and the best-performing block is then fine-tuned on remaining data. This strategy achieves higher accuracy than full or last-layer only finetuning and reduces overfitting variance (Barakat et al., 2023).
  • Localized Model Editing (LocFT-BF): For incremental or continual “edits" to LLMs (e.g., adding facts), finetuning is maximally restricted: only a single “down-projection" matrix in an upper transformer layer is updated, using an epoch-based, mini-batch breadth-first schedule. This preserves nearly all original knowledge and is highly scalable—successful up to 100k edits on >70B-parameter models—outperforming prior model-editing approaches in both reliability and generalization (Yang et al., 26 Sep 2025).
Strategy Parameters Updated Typical Application
Full Finetuning All layers, all weights Generic domain adaptation
Block-wise Selected contiguous layers Robust small-data tuning
Localized (LocFT-BF) Single MLP_down matrix Sequential model editing

The principal advantage is enhanced control over the adaptation–preservation trade-off and increased reliability under regime shifts.

3. Parameter-efficient, Representation, and Data-driven Approaches

Beyond restricting the layer space, recent advances target parameter and data efficiency, seeking to maximize adaptation per parameter or per example.

  • Representation Finetuning (ReFT, LoReFT): Instead of updating model weights, these methods insert learned intervention functions—low-rank, activation-space subspace edits—at specific hidden layers, acting solely during the forward pass. LoReFT achieves state-of-the-art adaptation efficiency, often outperforming LoRA/Adapter PEFTs while using 10–65× fewer parameters (Wu et al., 2024).
  • Targeted Efficient Finetuning (IRD): Recognizing that not all samples or parameters are equally informative, IRD selects mask subsets via iterative Fisher information maximization, alternating between focusing on informative samples and parameters. This data-centric mask selection improves robustness under distribution shift and outperforms random-sample PEFTs (e.g., FISH Mask) on GLUE and other LLM tasks (Dong et al., 2024).
  • Logits-based Finetuning (LFT): Training with “enriched” label targets that merge ground-truth one-hot with Top-K teacher logits, LFT fuses knowledge distillation and SFT, correcting for underspecified or ambiguous targets while retaining diversity. LFT yields consistent 7–23% absolute accuracy gains on mathematical reasoning benchmarks over standard SFT (Li et al., 30 May 2025).
Approach Update Target Efficiency Metric SOTA Domains
LoReFT (ReFT) Hidden states #parameters Commonsense, GLUE
IRD (FISH Mask+) Param/data mask Sample/param size GLUE, LLMs
LFT Output targets Train signal Math LLMs

Parameter-efficient and data-driven methods are crucial for settings with resource constraints, long adaptation sequences, or need for interpretability.

4. Reinforcement Learning and Pure RL-gradient Finetuning

Certain domains, especially sequential decision-making and LLM alignment, cannot be optimally served by pure supervised losses. Instead, RL-based objectives offer direct optimization of policy quality.

  • Online Finetuning of Decision Transformers (DTs): Prior extensions of DTs to online RL relied on supervised updating with “hindsight return relabeling.” This is incompatible with RL policy-gradient methods (e.g., GRPO), as return relabeling destroys importance sampling consistency. Removing relabeling and adopting sub-trajectory GRPO updates—featuring sub-trajectory returns, sequence-level importance weights, and active uncertainty-driven sampling—enables stable, RL-gradient-only finetuning. GRPO-DT sets new state-of-the-art D4RL-normalized returns, outperforming supervised and hybrid baselines (Luo et al., 1 Jan 2026).
  • RL-based Text Generation Tuning (OptAGAN): Combines VAE-GAN latents with RL finetuning of the decoder (GPT-2), where the reward mixes extrinsic (BLEU score) and entropy-derived intrinsic signals. The RL stage significantly improves BLEU and sample diversity over both vanilla GAN and VAE models (Tirotta et al., 2021).
  • Alignment in LLMs: Similar RL-gradient-only protocols (e.g., PPO, GRPO) have become standard in training RL-aligned LMs, underpinning their stability and reward sensitivity compared to purely supervised or relabel-based updates (Luo et al., 1 Jan 2026).

This cluster demonstrates the necessity of objective compatibility—especially for importance sampling and distribution-matching—in scalable online adaptation.

5. Fine-grained and Domain-targeted Strategies

Finetuning-based methods are also highly specialized for challenging adaptation scenarios:

  • Tabular Foundation Models: For transformer-based tabular models (TabPFNv2), full finetuning (all weights) is optimal, both for speed and accuracy, compared to PEFT variants (LoRA, embedding/LayerNorm tuning). Full adaptation reshapes retrieval-augmented mechanisms—post-finetuning test–train query–key similarity more accurately reflects label similarity, sharpening attention and reducing prediction entropy. However, this approach is less robust under temporal drift or major distribution shift (Rubachev et al., 10 Jun 2025).
  • Time Series Foundation Models (MSFT): Naive single-scale finetuning on pretrained multi-scale TSFMs incurs overfitting and fails to exploit hierarchical priors. MSFT explicitly constructs and finetunes at multiple downsampled temporal scales, using scale-specific adapters, decoupled self-/cross-attention, and learned output mixing. Empirically, MSFT sets new benchmarks on long- and probabilistic time series forecasting, outperforming both PEFT and from-scratch baselines (Qiao et al., 17 Jun 2025).
  • Medical Image Analysis with RepSim: Enforces explicit representational similarity between pre- and post-finetuning feature covariances (via learnable orthogonal transformation, regularized by linear CKA), preserving broad transferability and flattening the loss landscape, while only losing ≈1% in accuracy compared to full finetuning (Zu et al., 10 Mar 2025).
  • Concept-wise Finetuning: Causally targets negative transfer by maximizing mutual information among rare/concept feature patches and adjusting predictions via front-door attention mechanisms. These interventions significantly mitigate both rare-feature underfitting and spurious correlation, achieving up to 4.76% top-1 accuracy improvements in low-sample regimes (Yang et al., 2023).

6. Specialized Finetuning for Robustness, Model Editing, and Optimization

Additional dimensions of finetuning-based methods address robustness, lifelong adaptation, and optimizer design:

  • Anchor-based Robust Finetuning (ARF): To preserve domain-shift and zero-shot generalization in vision-LLMs (CLIP), ARF introduces “anchor” losses combining class-level contrastive loss with (i) caption-enriched text supervision and (ii) retrieved web-scale image–text pair alignments. These auxiliary alignment objectives prevent collapse of open-vocabulary geometry, with substantial gains on out-of-domain and zero-shot transfer benchmarks (Han et al., 2024).
  • Explanation-based Finetuning: Requiring both label and free-text explanation generation during finetuning diversifies gradient signals and decreases model reliance on spurious cues, halving OOD accuracy drop and reducing reliance on shortcut features—even when explanations are synthetic (Ludan et al., 2023).
  • Optimizer Innovations (PROFIT): Classical optimizers are not structure-aware of the pre-finetuned critical point. PROFIT regularizes the update direction by orthogonalizing the new-task gradient with respect to a reference-weight displacement; this efficient anchor-based mechanism reduces catastrophic forgetting while improving downstream accuracy across vision, motion prediction, and language tasks (Chakravarthy et al., 2024).
  • Meta-heuristic Fine-tuning: For domains where gradient-based updates may be stuck in sharp local minima, bio-inspired meta-heuristics (GA, PSO) perform local search within a bound around pretrained parameter vectors, offering escape from bad optima and small accuracy lifts without architecture changes (Rosa et al., 2022).

7. Empirical Highlights, Practical Recommendations, and Future Directions

Empirical benchmarking across domains, architectures, and adaptation regimes converges on the necessity of tailoring finetuning-based methods to structure, scale, and stability constraints:

  • Strategic parameter selection—whether by localized submatrices, blocks, or activation subspaces—yields large improvements in both editability and efficiency when compared to ad hoc or all-parameter updating (Yang et al., 26 Sep 2025, Barakat et al., 2023, Wu et al., 2024).
  • Data-centric and mask optimization (e.g., IRD) improves both accuracy and robustness to distribution shift under PEFT constraints (Dong et al., 2024).
  • RL-gradient and sequence-level objectives are essential in online or reward-driven adaptation, particularly to ensure policy update consistency, stable convergence, and effective exploration (Luo et al., 1 Jan 2026).
  • Auxiliary objectives and architectures—such as explanation generation, anchor-based alignment, and multi-scale or multi-modal mixing—are foundational for preserving generalization under domain shift, negative transfer, or catastrophic editing (Zu et al., 10 Mar 2025, Han et al., 2024, Yang et al., 2023).
  • Optimization infrastructure—e.g., proximal updates and meta-heuristics—further buffer catastrophic forgetting and optimize for both efficiency and final accuracy (Chakravarthy et al., 2024, Rosa et al., 2022).

Trends indicate increasing interest in scalable, modular, and robust finetuning protocols for foundation models in language, vision, tabular, and sequential domains, as well as hybrid adaptation mechanisms that combine activation- and parameter-centric updates.


Key Papers Referenced:

  • Online pure RL-gradient finetuning for Decision Transformers: "Online Finetuning Decision Transformers with Pure RL Gradients" (Luo et al., 1 Jan 2026)
  • Localized, breadth-first model editing: "Fine-tuning Done Right in Model Editing" (Yang et al., 26 Sep 2025)
  • Representation finetuning and LoReFT: "ReFT: Representation Finetuning for LLMs" (Wu et al., 2024)
  • Data-driven PEFT mask optimization: "Targeted Efficient Fine-tuning: Optimizing Parameter Updates with Data-Driven Sample Selection" (Dong et al., 2024)
  • Logits-based target enrichments: "Logits-Based Finetuning" (Li et al., 30 May 2025)
  • Multi-scale time series finetuning: "Multi-Scale Finetuning for Encoder-based Time Series Foundation Models" (Qiao et al., 17 Jun 2025)
  • RL-gradients and sequence-level updates: (Luo et al., 1 Jan 2026, Tirotta et al., 2021)
  • Robustness via auxiliary objectives: (Han et al., 2024, Ludan et al., 2023)
  • Optimizer advancements: "PROFIT: A Specialized Optimizer for Deep Fine Tuning" (Chakravarthy et al., 2024)
  • Block-wise optimization: "Improving Reliability of Fine-tuning with Block-wise Optimisation" (Barakat et al., 2023)
  • Domain adaptation and generalization in handwriting recognition: "Fine-tuning Is a Surprisingly Effective Domain Adaptation Baseline in Handwriting Recognition" (Kohút et al., 2023)
  • Negative transfer and causal/attention correction: "Concept-wise Fine-tuning Matters in Preventing Negative Transfer" (Yang et al., 2023)

These works delineate the state of the art and current practices in the design, analysis, and practical deployment of finetuning-based methods across the machine learning landscape.

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Finetuning-based Methods.