DoRAN: Adaptive PEFT for Foundation Models

Updated 12 October 2025

DoRAN is a PEFT method that adapts large-scale models using dynamic low-rank parameter generation via adaptive noise injection and auxiliary networks.
It introduces a learnable regularizer to stabilize gradient updates, bridging the performance gap between LoRA and DoRA.
Experimental results demonstrate improved accuracy and sample efficiency across vision and language tasks with minimal additional parameter cost.

DoRAN is a parameter-efficient fine-tuning (PEFT) method for large-scale foundation models that augments the Weight-Decomposed Low-Rank Adaptation (DoRA) approach through adaptive noise injection and auxiliary (hyper) networks for dynamic low-rank parameter generation. DoRAN specifically addresses stability and sample efficiency issues inherent in prior low-rank adaptation methods such as LoRA and DoRA, demonstrating improved empirical performance across both vision and language tasks.

1. Foundations: PEFT, LoRA, and DoRA

Parameter-efficient fine-tuning (PEFT) enables adaptation of overparameterized models with minimal trainable parameters by introducing structured, low-rank updates—typically without modifying the core (pre-trained) weights. LoRA modifies the original weight $W_0$ of a layer via: $W = W_0 + BA$ where $B$ and $A$ are low-rank matrices with trainable parameters. DoRA advances this by explicitly decomposing the weight into magnitude and directional components: $W = m \cdot (W_0 + BA) / \|W_0 + BA\|$ where $m$ is a (learnable) scaling vector and the denominator normalizes the update direction, aiming to better approximate full fine-tuning dynamics.

However, two primary limitations are observed in DoRA:

The normalization denominator $\|W_0 + BA\|$ may become small, resulting in gradient instability (potentially exploding gradients).
The use of layer-local, static low-rank matrices can restrict sample efficiency and prevent sharing of adaptation information across layers.

2. DoRAN Core Algorithm: Stabilization and Network-Based Parameterization

DoRAN introduces two central modifications:

2.1 Noise Injection and Adaptive Regularization

To stabilize normalization, DoRAN adds a learnable positive regularizer $\tau$ to the denominator: $W = m \cdot (W_0 + BA) / (\|W_0 + BA\| + \tau)$ $\tau \in \mathbb{R}^+$ serves as an adaptive noise buffer, reducing sensitivity to near-zero norms:

If $\tau$ is small, DoRAN approaches DoRA-like behavior, primarily learning directional updates.
For large $\tau$ , normalization is relaxed, closer to the unnormalized update regime of LoRA.

This controlled interpolation manages the gradient's parallel and orthogonal components (with respect to $W_0$ ), guarding against vanishing denominators and providing stable learning dynamics as formalized in the paper’s gradient analysis.

2.2 Auxiliary Networks for Dynamic Low-Rank Generation

DoRAN replaces per-layer static low-rank matrices with small auxiliary feedforward networks (hypernetworks) $g_1$ and $g_2$ , mapping a shared latent embedding $e$ to low-rank factors: $B = g_1(e), \quad A = g_2(e)$ Consequently, the low-rank update in each layer is generated dynamically, with shared parameters enabling coupling of adaptation information across layers and attention heads. This structural coupling promotes greater sample efficiency, especially under data scarcity, while retaining model expressiveness.

3. Mathematical Formulation and Gradient Behavior

The full DoRAN update for a linear (or affine) layer is: $W = m \cdot (W_0 + g_2(e)g_1(e)) / (\|W_0 + g_2(e)g_1(e)\| + \tau)$

Gradient analysis reveals:

The update decomposes into parallel (norm scaling) and orthogonal (directional) contributions.
$\tau$ adaptively regularizes both components, preventing instability and facilitating robust learning.

4. Experimental Evaluation

Benchmarking covers both vision and language adaptation scenarios.

4.1 Vision: VTAB-1K and FGVC

DoRAN is instantiated atop a ViT-B/16 backbone pre-trained on ImageNet-21K.
On the VTAB-1K benchmark, adding only the stabilizing $\tau$ (" $\tau$ –DoRA") increases average accuracy by $\sim$ 0.5% over DoRA, while inclusion of auxiliary networks yields up to 1.8% gains.
Fine-grained visual categorization (FGVC) shows similar improvements with minimal parameter overhead (roughly 0.09% additional trainable parameters compared to DoRA).

4.2 Language: Commonsense Reasoning

Tasks encompass eight commonsense benchmarks (e.g., BoolQ, PIQA, HellaSwag, ARC-c) with LLaMA-7B and LLaMA-13B.
DoRAN surpasses LoRA and DoRA by 1–2% in accuracy and demonstrates substantially improved sample efficiency, particularly in low-data regimes.

5. Theoretical and Practical Implications

Noise injection via $\tau$ offers tunable regularization, interpolating stably between LoRA and DoRA behaviors.
Auxiliary network parameterization enables cross-layer sharing and enhances data efficiency, facilitating robust adaptation with minimal added compute or parameters.
DoRAN’s construction allows theoretically controlled trade-offs between magnitude and direction adaptation, bridging the rigidity of DoRA with the scale flexibility of LoRA.

A plausible implication is that DoRAN’s two-stage approach—adaptive regularization and parameter-sharing via networks—will generalize to other architectural modalities (e.g., multimodal transformers) and distributed/federated fine-tuning settings, especially where sample efficiency is paramount.

6. Limitations and Open Directions

While DoRAN incurs negligible additional parameter cost, the introduction of auxiliary networks brings extra architectural choices and hyperparameters. Careful design and tuning may be necessary for optimal cross-layer coupling and stability in diverse foundation model classes.

Potential avenues for future inquiry include:

Precise characterization of optimal $\tau$ scheduling during training.
Extensions to recurrent, graph, or multimodal model families.
More advanced network architectures for low-rank generation beyond simple feedforward models.
Analytical studies of generalization and expressivity, especially in low-resource adaptation.

7. Summary Table: Core Differences

Method	Stability Mechanism	Low-Rank Generation
LoRA	None; direct update	Per-layer static matrices
DoRA	Normalization; no $\tau$	Per-layer static matrices
DoRAN	Adaptive $\tau$ regularizer	Auxiliary networks (shared)

DoRAN thus emerges as a robust, efficient, and theoretically principled PEFT method, combining adaptive normalization and parameter-sharing architectures to yield superior fine-tuning behavior and sample efficiency, as validated on multiple vision and language domains (Diep et al., 5 Oct 2025).

Markdown Upgrade to Chat

References (1)

DoRAN: Stabilizing Weight-Decomposed Low-Rank Adaptation via Noise Injection and Auxiliary Networks (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to DoRAN.