Singular Value Fine-Tuning for FSCIL

Updated 4 December 2025

The paper introduces SVFCL, a singular value fine-tuning method that restricts parameter updates to singular values, effectively mitigating catastrophic forgetting and overfitting.
It leverages SVD by freezing the U and V components and tuning only the diagonal singular values, which preserves the learned feature subspace while enabling efficient adaptation.
Empirical results demonstrate that SVFCL achieves state-of-the-art accuracy on benchmarks with only 0.11% trainable parameters, outperforming conventional PEFT methods.

Singular Value Fine-Tuning for Few-Shot Class-Incremental Learning (SVFCL) is a parameter-efficient adaptation method designed to address both catastrophic forgetting and overfitting in few-shot class-incremental learning (FSCIL) scenarios. SVFCL leverages the structure of singular value decomposition (SVD) to constrain updates on large pre-trained models, offering significant improvements over conventional parameter-efficient fine-tuning (PEFT) baselines such as prompt tuning and Low-Rank Adaptation (LoRA), especially in the regime of incremental tasks with highly limited data per new class (Wang et al., 13 Mar 2025).

1. Mathematical Foundations and Core Algorithm

The core principle of SVFCL is the decomposition of a model weight matrix $W \in \mathbb{R}^{m \times n}$ via singular value decomposition:

$W = U \Sigma V^\top$

where $U \in \mathbb{R}^{m \times m}$ and $V \in \mathbb{R}^{n \times n}$ are orthonormal (left and right singular vectors), and $\Sigma$ is diagonal with non-negative singular values (Eq. 2 in (Wang et al., 13 Mar 2025)).

Unlike LoRA or prompt tuning, which introduce trainable low-rank adapters or prompt parameters, SVFCL freezes $U$ and $V$ for all future learning and tunes only the singular values $\Sigma$ —effectively scaling the axes of the pre-trained subspace without shifting its orientation. For each incremental session $t$ , an adapter $\Delta \Sigma_t$ is learned, and the effective incremental weight matrix is reconstructed as:

$W_t = W + U \cdot M(\{\Delta \Sigma_i\}_{i=0}^{t-1}, \Delta\Sigma_t) \cdot V^\top$

where $M$ denotes merging of all previously learned singular value adapters with the current one (usually by summation, as in Eq. 5).

The optimization at step $t$ solves:

$\min_{\Delta\Sigma_t,\;\omega}\;\mathbb{E}_{(x,y)\in D^t} L(g_\omega(f_\theta(x)), y)$

with $\Delta\Sigma_t$ and classifier parameters $\omega$ updated by gradient descent. The entire procedure is summarized as follows (from Algorithm 1, (Wang et al., 13 Mar 2025)):

Input: Tasks {D^0, D^1, ..., D^M}, pre-trained encoder f_θ, initial NCM classifier g_ω, matrix W.

1. Compute SVD: W = U Σ V^T; fix U, V.
2. For t = 0 (base): Fine-tune ΔΣ_0 and ω on D^0.
3. For each increment t ≥ 1:
    a. Allocate adapter ΔΣ_t.
    b. Merge previous {ΔΣ_i}, reconstruct W_t = W + U Σ_merged V^T.
    c. Tune ΔΣ_t and ω on D^t; freeze rest.
    d. Update NCM prototypes for new classes.

SVFCL can also be formulated with explicit projection operators. For each session $t$ in a layer $l$ ,

$\Delta W_{t,\text{low}}^{(l)} = U^{(l)}_{\text{low}}\bigl(U^{(l)\top}_{\text{low}} \nabla W^{(l)}_t V^{(l)}_{\text{low}}\bigr)V^{(l)\top}_{\text{low}}$

where only the low-energy singular directions are allowed to be updated, ensuring orthogonality with preserved high-energy subspaces as in (Nayak et al., 9 Apr 2025).

2. Catastrophic Forgetting and Overfitting Mitigation

SVFCL’s update strategy offers explicit control over catastrophic forgetting and overfitting. By restricting fine-tuning to the singular values, updates remain within the span of the foundation model’s learned feature space (fixed $U$ and $V$ ), minimizing interference with representations learned for previous tasks. Theorem 1 in (Wang et al., 13 Mar 2025) demonstrates that the Frobenius norm of the SVFCL update is bounded above by comparable LoRA updates when targeting the same optimal perturbation, leading to reduced drift on prior classes.

Empirical results show that the train-validation gap is substantially smaller for SVFCL than for InfLoRA or L2P in few-shot sessions (Fig. 2, (Wang et al., 13 Mar 2025)), evidencing improved generalization and less overfitting. This is attributed to the significantly reduced VC dimension when only $O(r)$ singular values (with $r$ being the effective SVD rank) are updated, rather than $O(m+n)$ parameters as in LoRA, or the typically larger number in prompt tuning paradigms.

3. Parameter Efficiency and Hyperparameter Selection

SVFCL achieves high parameter efficiency, as it only introduces trainable elements proportional to the number of singular values present in the truncated SVD: | Fine-Tuning Method | Trainable Params (ViT-B/16) | Percentage of Full Model | |---------------------|-----------------------------|----------------------------------| | Full fine-tuning | 86.6M | 100% | | L2P prompt-tuning | 0.485M | 0.56% | | InfLoRA | 0.261M | 0.27% | | SVFCL | 0.095M | 0.11% |

Crucial hyperparameters include:

Learning rate: Adam optimizer with $\eta=5\times10^{-4}$ .
Number of epochs: 5 for the base session, 2 for each incremental few-shot session.
Model blocks adapted: Generally the first 7 transformer MLP blocks (0–6), balancing plasticity with retention of foundational representation.
SVD truncation rank $r'$ : Typically in the range 200–500 to preserve $\geq99\%$ performance (Fig. 8, (Wang et al., 13 Mar 2025)). Adapting too small a subset of singular values (e.g., $r'=50$ ) degrades performance.

For the Sculpting Subspaces variant (Nayak et al., 9 Apr 2025), rank selection can be guided by energy-preserving criteria ( $\sum_{i=1}^k \sigma_i / \sum_{i=1}^r \sigma_i \geq \phi$ with $\phi \in [0.5,0.8]$ ) or by adaptive retention strategies based on per-layer importance.

4. Empirical Performance and Benchmarking

SVFCL demonstrates state-of-the-art results on diverse FSCIL benchmarks. On miniImageNet (5-way, 5-shot, 8 sessions), SVFCL obtains an average Top-1 accuracy of $96.3\%$ with a performance drop (PD) of $2.3\%$ from the base to final session. On CUB200-2011, corresponding values are $83.5\%$ (A_avg) and $4.5\%$ (PD), both best among considered baselines (Table 2, (Wang et al., 13 Mar 2025)). For ImageNet-R, SVFCL achieves $A_\text{avg} = 70.1\%$ and $PD=12.2\%$ , significantly outperforming the next best baseline FOSTER ( $A_\text{avg}=68.2\%$ , $PD=30.8\%$ ; see Table 3).

Session-wise Top-1 accuracy curves across benchmarks (Fig. 6) show the slowest performance decline with SVFCL compared to parameter-efficient methods and full fine-tuning.

On synthetic CIFAR-100 FSCIL splits (Nayak et al., 9 Apr 2025), a related SVFCL variant achieves average accuracy up to $68.7\%$ (starting from $82.0\%$ in the base session and ending with $56.8\%$ ) with 5% or less parameter overhead. Ablation studies indicate significant accuracy drops when $k/r < 0.2$ ; improvement plateaus beyond $k/r \approx 0.5$ .

5. Ablations and Component Analyses

Several ablations quantify the impact of SVFCL's design choices:

Blocks Adapted: Tuning the lower transformer MLP blocks (0–6) is optimal (Table 4, (Wang et al., 13 Mar 2025)).
Singular Value Rank: $r'=500$ yields near-maximum performance; further reduction harms accuracy (Fig. 8).
Fine-tuned Components: Freezing $U$ and $V$ while tuning only $\Sigma$ outperforms any combination where $U$ or $V$ is also tuned (Fig. 9).
These results underscore the importance of fixing the subspace orientation to preserve foundational knowledge, while controlling adaptation "capacity" via the singular scales.

6. Visualizations and Qualitative Insights

Attention maps (Fig. 1, (Wang et al., 13 Mar 2025)) illustrate that SVFCL focuses attention primarily on object regions, with less background distraction compared to full fine-tuning and other PEFT baselines—indicative of improved generalization. The pre-trained singular-value spectra of foundation model weights exhibit a long-tail distribution, with the majority of representational power concentrated in the top few hundred singular values; this justifies the empirical efficacy of low-rank tuning.

Overfitting curves (Fig. 2) reveal the smallest training vs. validation gap for SVFCL on CUB200-2011, confirming its strong generalization under few-shot learning stress.

7. Extensions, Orthogonality, and Best Practices

The Sculpting Subspaces variant (Nayak et al., 9 Apr 2025) extends singular value fine-tuning by enforcing explicit orthogonality constraints: updates are projected into the orthogonal complement of previously learned high-energy directions, with an optional spectral regularizer penalizing any leakage. This enables capacity expansion for new classes without interfering with earlier knowledge. Best practices include:

Conservative initial low-rank budgets (e.g., $k/r \approx 0.2$ ) with gradual expansion.
Periodic SVD recomputation (every 2–3 sessions) if compute permits.
Reliance on a well-trained base model and accurate initial SVDs.

SVFCL maintains a fixed model footprint across sessions (no new task-specific parameters), and is especially suitable when few-shot session sizes are small, leveraging the stability of the subspace-based constraints to balance plasticity with retention.

SVFCL exemplifies a singular value-focused, subspace-preserving strategy for few-shot, class-incremental scenarios, leading to superior generalization, minimal forgetting, and high parameter efficiency compared to both full fine-tuning and traditional parameter-efficient methods (Wang et al., 13 Mar 2025, Nayak et al., 9 Apr 2025).