KRAdapter: Efficient High-Rank PEFT

Updated 13 November 2025

KRAdapter is a parameter-efficient fine-tuning method that uses the Khatri–Rao product to create high effective-rank updates for complex, high-frequency data.
It improves out-of-distribution generalization and robustness in both vision-language models and large language models while retaining efficient memory and compute usage.
Empirical evaluations show KRAdapter achieves flatter singular value spectra and lower nuclear-norm errors compared to traditional low-rank methods like LoRA.

KRAdapter is a parameter-efficient fine-tuning (PEFT) algorithm designed to upgrade the representational capacity of weight updates in large pre-trained neural networks, particularly in scenarios where low-rank adaptation methods like LoRA are insufficient, such as when modeling data with high effective rank or intricate spectral properties. By leveraging the Khatri–Rao product—a column-wise Kronecker product—KRAdapter increases the effective rank of learned updates while retaining the practical memory and compute profiles central to state-of-the-art PEFT approaches. KRAdapter demonstrates performance gains on both vision-LLMs and LLMs, with particular strength in out-of-distribution (OOD) generalization, and maintains computational efficiency compatible with billion-scale neural architectures (Albert et al., 1 Aug 2025).

1. Parameter-efficient Fine-tuning Formulation

In the canonical PEFT setting, one begins with a pre-trained weight matrix $W_0 \in \mathbb{R}^{d_\text{out} \times d_\text{in}}$ . Fine-tuning introduces a small trainable update $\Delta W$ so that for an input $x \in \mathbb{R}^{d_\text{in}}$ , the model computes

$h = (W_0 + \Delta W)x.$

Full fine-tuning makes all $d_\text{out} \times d_\text{in}$ entries of $\Delta W$ trainable, whereas LoRA restricts $\Delta W$ to be rank- $r$ :

$\Delta W = B A,\quad A \in \mathbb{R}^{r \times d_\text{in}},\; B \in \mathbb{R}^{d_\text{out} \times r},$

training only $A$ and $B$ , typically with $r \ll \min(d_\text{out}, d_\text{in})$ . The limitation of LoRA arises when $\Delta W$ must approximate full-rank, high-frequency, or high effective rank matrices, a situation common in multi-modal and OOD tasks.

2. Mathematical Construction of KRAdapter

KRAdapter parameterizes updates via the Khatri–Rao product. Let $U \in \mathbb{R}^{k_1 \times d_\text{in}}$ and $V \in \mathbb{R}^{k_2 \times d_\text{in}}$ be trainable matrices, with $k_1 k_2 \geq d_\text{out}$ . The Khatri–Rao product ( $U \odot V$ ) is defined column-wise: for each $j=1, \ldots, d_\text{in}$ ,

$(U \odot V)_j = u_j \otimes v_j \in \mathbb{R}^{k_1 k_2},$

with $u_j$ and $v_j$ denoting the $j$ th columns of $U$ and $V$ . Stacking these for all $j$ ,

$U \odot V = [\,u_1\otimes v_1,\,\ldots,\,u_{d_\text{in}} \otimes v_{d_\text{in}}\,] \in \mathbb{R}^{k_1 k_2 \times d_\text{in}}.$

The update is then constructed as:

$\Delta W = \text{reshape}[(U \odot V)]_{[1:d_\text{out}, :]},$

truncating as needed. A scalar $\alpha$ (e.g., $\alpha=0.1$ for vision models) scales the update, and the forward pass is $h = (W_0 + \alpha\Delta W)x$ .

This formulation, by construction, produces an update with high effective rank:

With random (i.i.d.) $U, V$ , the columns of $U \odot V$ are almost surely linearly independent if $k_1 = k_2 = k$ , $k^2 \geq d_\text{in}$ .
Empirically, $U \odot V$ yields a much flatter singular value spectrum than LoRA or Kronecker-product adapters.

3. Spectral Properties and Effective Rank

Low-rank LoRA updates have singular values dropping sharply to zero after the $r^{\text{th}}$ component, limiting their expressivity for high-rank matrix approximation.

KRAdapter, in contrast, delivers updates with near-full rank and slow spectral decay. Effective rank, defined as

$r_\text{eff}(M) = \exp\left(-\sum_i p_i \log p_i\right),\quad p_i = \frac{\sigma_i}{\sum_j \sigma_j}$

with $\{\sigma_i\}$ the singular values of $M$ , is consistently higher with KRAdapter than LoRA, SinLoRA, RandLoRA, or Kronecker adapters (Albert et al., 1 Aug 2025). Synthetic benchmarks with diverse spectra (random Gaussian, PCA-whitened, high/low-frequency sinusoids, CLIP weight-deltas) confirm that KRAdapter matches LoRA on strictly low-rank targets but substantially outperforms on high-rank and high-frequency scenarios.

4. Computational Efficiency and Implementation

KRAdapter is designed to match or minimally exceed the compute and memory profiles of LoRA:

Number of parameters: $N_\text{KR} = d_\text{in}(k_1 + k_2)$ , minimized for $k_1 = k_2 = \lfloor\sqrt{d_\text{out}}\rfloor$ .
For $k_1=k_2=\sqrt{d_\text{out}}$ , $N_\text{KR} \approx 2\sqrt{d_\text{out}d_\text{in}}$ .
LoRA with rank $r$ needs $r(d_\text{out} + d_\text{in})$ , commonly $N_\text{KR} \approx N_\text{LoRA}$ for $r=16\text{--}32$ .
Extra FLOPs for forward pass is one $d_\text{out}\times d_\text{in}$ matrix–vector multiplication, negligible versus the cost of $W_0 x$ ( $<1$ ms on 1B-parameter models).
Training speed and VRAM usage are within $1$– $5\%$ of LoRA.

The update is efficiently realized by reshaping and stacking columns, exploiting Khatri–Rao structure for high throughput.

5. Empirical Evaluation and Benchmarks

KRAdapter has been extensively benchmarked:

Synthetic Matrix Approximation

Benchmarks use matrices with controlled spectral profiles (Gaussian, sparse, decorrelated, low-rank, CLIP-deltas, superposed sinusoids).
KRAdapter uniformly outperforms LoRA except on strictly low-rank cases and provides the flattest spectrum approximation (lowest nuclear-norm error relative to LoRA).

Vision-LLMs

Fine-tuned on CLIP variants (ViT-B/32, ViT-L/14, ViT-H/14) across 11 few-shot datasets, ImageNet (50%/100%), and VTAB1k (Natural, Structured, Specialized).
On 11 classical vision tasks, KRAdapter exceeds LoRA and other adapters by $1$– $2\%$ .
For out-of-distribution (OOD) robustness, the generalization ratio $r_\text{gen} = \Delta_\text{OOD} / \Delta_\text{ID}$ is $0.45$ for KRAdapter on ViT-B/32, compared to $0.27$ for LoRA.
KRAdapter’s updates show smaller nuclear/Frobenius norm shifts from zero-shot, correlating with greater robustness.

LLMs

Applied to Llama3.1-8B and Qwen2.5-7B (adapters on key/value projections).
Trained on science QA datasets (SIQA, ARC-E, ARC-C, OBQA), with evaluation on in-distribution, near-distribution (HellaSwag), and OOD (BoolQ, PIQA, WinoGrande).
KRAdapter achieves the highest average OOD scores: e.g., Llama3 OOD $64.66\%$ for KRAdapter versus $55.62\%$ for LoRA, $61.37\%$ for KronA.

6. Hyperparameters, Practical Use, and Limitations

Typical hyperparameters:

$k_1=k_2=\lfloor\sqrt{d_\text{out}}\rfloor$ (trade-off parameter count vs approximation).
Scaling $\alpha=0.1$ (vision) or $\alpha=2$ with reweighting for LLMs.
Learning rates: $10^{-2}$ (matrix toy), $10^{-4}$ (CLIP/LLM), AdamW optimizer.
Dropout $p=0.05$ (LLMs) to regularize adapters.

KRAdapter’s minimum parameter budget is $\approx 2\sqrt{d_\text{out} d_\text{in}}$ , which can exceed LoRA with extreme rank-$1$ ( $r=1$ ) setups. For scenarios requiring tight rank constraints or extreme compactness, LoRA may be preferred. KRAdapter is suboptimal on strictly low-rank targets and, in some cases, full-rank random parametrizations match or slightly exceed its in-distribution performance on large models.

Future research directions include deploying nested low-rank decompositions for $U$ and $V$ , formal paper of Khatri–Rao spectral shaping under realistic initializations, and extension to convolutional or other structured layers.

7. Significance and Broader Applicability

KRAdapter advances PEFT by enabling high effective-rank updates while efficiently utilizing parameters and compute. Its spectral properties enhance generalization, especially for tasks involving distribution shift (OOD), multi-modal, and compositional learning. Unlike strictly low-rank approaches, KRAdapter’s update parametrization, via the Khatri–Rao product, offers a theoretically and empirically justified trade-off between resource footprint and expressive adaptation. Empirical studies demonstrate superiority over LoRA and alternative adapters on both vision-language and LLM benchmarks, with particular gains in robustness and OOD accuracy. This approach is particularly relevant for evolving PEFT requirements as models and deployment contexts diversify (Albert et al., 1 Aug 2025).

PDF Markdown Chat (Pro)

References (1)

Towards Higher Effective Rank in Parameter-efficient Fine-tuning using Khatri--Rao Product (2025)

Follow Topic

Get notified by email when new papers are published related to KRAdapter.