Minor Component Adaptation (MiCA)

Updated 3 July 2026

MiCA is a parameter-efficient fine-tuning technique that restricts model adaptation to the minor singular subspace of weight matrices, enhancing knowledge transfer.
It employs spectral decomposition to isolate least significant singular vectors, reducing adapter parameters compared to methods like LoRA.
Empirical evaluations show up to a 5.9-fold improvement in knowledge acquisition, demonstrating robust performance in domain-specific adaptation.

Minor Component Adaptation (MiCA) is a parameter-efficient fine-tuning technique for LLMs that restricts model adaptation to the minor singular subspace of pre-trained weight matrices. Unlike approaches such as Low-Rank Adaptation (LoRA) that target dominant (major) singular components, MiCA leverages the least significant singular vectors—subspaces typically underutilized by standard pre-training—to enable more efficient, stable knowledge injection with a reduced adapter parameter footprint. Empirical evidence demonstrates up to a 5.9-fold improvement in knowledge acquisition relative to LoRA under optimal hyperparameters, while requiring only 6–60% of the adapter parameters used by LoRA (Rüdiger et al., 2 Apr 2026).

1. Theoretical Basis and Notation

MiCA is rooted in the spectral decomposition of transformer layer weights. A weight matrix $W \in \mathbb{R}^{d \times d}$ (with possible generalization to $W \in \mathbb{R}^{d_\text{out} \times d_\text{in}}$ ) is decomposed via Singular Value Decomposition (SVD) as $W = U \Sigma V^\top$ , where $U \in \mathbb{R}^{d \times d}$ and $V \in \mathbb{R}^{d \times d}$ are orthogonal, and $\Sigma = \mathrm{diag}(\sigma_1, \ldots, \sigma_d)$ contains singular values sorted in descending order. The subspace spanned by the bottom- $r$ left singular vectors ( $u_{d-r+1}, \ldots, u_d$ ) defines the minor singular subspace. This low-energy region is conventionally under-utilized by pre-trained models.

2. MiCA Algorithmic Formulation

MiCA constrains all updates during fine-tuning to the minor singular subspace:

Subspace Selection: Fix a rank $r \ll d$ and extract $U_{\text{minor}} = U[:, d-r+1:d] \in \mathbb{R}^{d \times r}$ , the matrix of minor left singular vectors.
Adapter Parameterization: Introduce a trainable coefficient matrix $W \in \mathbb{R}^{d_\text{out} \times d_\text{in}}$ 0, initialized to zero, and freeze $W \in \mathbb{R}^{d_\text{out} \times d_\text{in}}$ 1 (denoted as $W \in \mathbb{R}^{d_\text{out} \times d_\text{in}}$ 2).
Update Rule: The adaptation to $W \in \mathbb{R}^{d_\text{out} \times d_\text{in}}$ 3 is constrained as

$W \in \mathbb{R}^{d_\text{out} \times d_\text{in}}$ 4

with global scaling $W \in \mathbb{R}^{d_\text{out} \times d_\text{in}}$ 5 (typically $W \in \mathbb{R}^{d_\text{out} \times d_\text{in}}$ 6, so $W \in \mathbb{R}^{d_\text{out} \times d_\text{in}}$ 7). The fine-tuned weight is $W \in \mathbb{R}^{d_\text{out} \times d_\text{in}}$ 8, ensuring $W \in \mathbb{R}^{d_\text{out} \times d_\text{in}}$ 9 has rank at most $W = U \Sigma V^\top$ 0 and is contained entirely within the minor subspace.

3. Optimization and Hyperparameter Regimes

MiCA fine-tuning involves grid search over:

Rank $W = U \Sigma V^\top$ 1: Typical values include 16, 32, 128
Learning Rate $W = U \Sigma V^\top$ 2: e.g., $W = U \Sigma V^\top$ 3, $W = U \Sigma V^\top$ 4, $W = U \Sigma V^\top$ 5
Epochs $W = U \Sigma V^\top$ 6: e.g., 4 or 8
Scaling $W = U \Sigma V^\top$ 7: Usually set to $W = U \Sigma V^\top$ 8
Optional: LoRA-style dropout, weight decay, warmup ratio

During optimization, both the original weight $W = U \Sigma V^\top$ 9 and basis $U \in \mathbb{R}^{d \times d}$ 0 are frozen; only $U \in \mathbb{R}^{d \times d}$ 1 is updated (using AdamW with weight decay $U \in \mathbb{R}^{d \times d}$ 2 and a cosine learning-rate schedule). Cross-entropy serves as the training loss for language modeling or multiple-choice QA. The maximum gradient norm is 1.0, and training precision is bfloat16 or bf16. No regularization is applied beyond the intrinsic rank constraint imposed by the parameterization.

4. Empirical Evaluation and Comparative Analysis

Downstream Tasks

MiCA’s effectiveness was evaluated on pre-training and factual knowledge transfer in two principal benchmarks:

BLOGS dataset: Continued pre-training on 30 paraphrased blog posts, evaluated on BLOGS-MC (300 GPT-4-generated multiple-choice questions), TruthfulQA, and HellaSwag.
HISTORY dataset: Training on a 100,000-token German history monograph, with evaluation on HISTORY-MC (102 questions) and HellaSwag.

Methods Compared

Full Fine-Tuning (Full FT): Updating all model parameters
LoRA: Standard low-rank adaptation, optimizing $U \in \mathbb{R}^{d \times d}$ 3 and $U \in \mathbb{R}^{d \times d}$ 4
MiCA: Only $U \in \mathbb{R}^{d \times d}$ 5 is trained, with $U \in \mathbb{R}^{d \times d}$ 6 frozen

Parameter and Compute Analysis

Model	Total Params	LoRA Adapter Params	MiCA Adapter Params	MiCA/LoRA %
Llama-2-7B	6,747M	67M ( $U \in \mathbb{R}^{d \times d}$ 7)	4M ( $U \in \mathbb{R}^{d \times d}$ 8)	6%
Qwen2.5-7B	7,626M	10M ( $U \in \mathbb{R}^{d \times d}$ 9)	6M ( $V \in \mathbb{R}^{d \times d}$ 0)	60%

Performance Results

Method	Model	BLOGS-MC	TruthfulQA	HellaSwag	$V \in \mathbb{R}^{d \times d}$ 1	LR	Epochs	Params
Baseline	Llama-2-7B-chat	56.18	34.79	60.40	—	—	—	6,747M
LoRA (optimal)	Llama-2-7B	58.28	35.47	60.41	128	1e-4	8	67M
MiCA (optimal)	Llama-2-7B	61.33	35.29	60.11	16	5e-4	4	4M
Baseline	Qwen2.5-7B	72.91	43.27	60.60	—	—	—	7,626M
LoRA (optimal)	Qwen2.5-7B	73.87	42.95	60.95	32	5e-4	4	10M
MiCA (optimal)	Qwen2.5-7B	75.63	43.38	61.62	32	5e-4	8	6M

MiCA achieves a 3-point absolute gain on BLOGS-MC over LoRA for Llama-2-7B and a 1.8-point gain for Qwen2.5-7B, using as little as 6–60% the number of adapter parameters. Abstractly, MiCA demonstrates up to a 5.9-fold improvement in knowledge acquisition under optimized hyperparameters relative to LoRA, with a significantly reduced parameter footprint (Rüdiger et al., 2 Apr 2026).

5. Ablation, Convergence, and Spectral Insights

Empirical analysis confirms that updates in the minor singular subspace are particularly effective for domain-specific knowledge injection:

Spectral Grounding: Confining updates to low-energy directions (minor singular vectors) helps prevent overwriting dominant model components that encode generic pre-trained knowledge.
Empirical Stability: MiCA’s learning curves indicate more rapid and stable convergence compared to LoRA or random subspace baselines.
Ablation Study (Qwen-2.5-7B, $V \in \mathbb{R}^{d \times d}$ 2=32):

| Adaptation | BLOGS-MC Accuracy | |--------------------------|-------------------| | No FT (Instruct) | 72.91 | | Major-r Adaptation | 74.21 | | Random Subspace ( $V \in \mathbb{R}^{d \times d}$ 3) | 73.75 | | Minor-r (MiCA) | 75.63 |

Minor singular directions outperform both major and random subspace adaptations, supporting the hypothesis that the least expressive directions are best suited for domain-specific adaptation.

6. Implementation and Practical Considerations

MiCA requires only a single SVD per layer, after which the minor component basis $V \in \mathbb{R}^{d \times d}$ 4 remains frozen. Pseudocode for a single-layer adaptation is:

$V \in \mathbb{R}^{d \times d}$ 7

Integration is straightforward with transformer frameworks (Hugging Face, PEFT), requiring only replacement of LoRA modules with MiCA modules in $V \in \mathbb{R}^{d \times d}$ 5 and $V \in \mathbb{R}^{d \times d}$ 6 matrices. SVD and minor vector extraction incur only a one-time cost per layer. MiCA’s parameter and computational efficiency makes it amenable to federated learning and on-device adaptation settings.

7. Summary and Significance

Minor Component Adaptation is a parameter-efficient fine-tuning methodology that exploits the latent capacity of minor singular directions in model weight matrices. By focusing adaptation within these subspaces, MiCA integrates new factual knowledge more efficiently than both full fine-tuning and LoRA, with empirically demonstrated superiority in both learning efficiency and model stability. The constraint to minor singular directions prevents interference with core model capabilities, providing an effective mechanism for domain adaptation with a minimal parameter and compute footprint (Rüdiger et al., 2 Apr 2026).

Markdown Report Issue Upgrade to Chat

References (1)

MiCA Learns More Knowledge Than LoRA and Full Fine-Tuning (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Minor Component Adaptation (MiCA).

Minor Component Adaptation (MiCA)

1. Theoretical Basis and Notation

2. MiCA Algorithmic Formulation

3. Optimization and Hyperparameter Regimes

4. Empirical Evaluation and Comparative Analysis

Downstream Tasks

Methods Compared

Parameter and Compute Analysis

Performance Results

5. Ablation, Convergence, and Spectral Insights

6. Implementation and Practical Considerations

7. Summary and Significance

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Minor Component Adaptation (MiCA)

1. Theoretical Basis and Notation

2. MiCA Algorithmic Formulation

3. Optimization and Hyperparameter Regimes

4. Empirical Evaluation and Comparative Analysis

Downstream Tasks

Methods Compared

Parameter and Compute Analysis

Performance Results

5. Ablation, Convergence, and Spectral Insights

6. Implementation and Practical Considerations

7. Summary and Significance

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research