Papers
Topics
Authors
Recent
2000 character limit reached

DoRAN: Adaptive PEFT for Foundation Models

Updated 12 October 2025
  • DoRAN is a PEFT method that adapts large-scale models using dynamic low-rank parameter generation via adaptive noise injection and auxiliary networks.
  • It introduces a learnable regularizer to stabilize gradient updates, bridging the performance gap between LoRA and DoRA.
  • Experimental results demonstrate improved accuracy and sample efficiency across vision and language tasks with minimal additional parameter cost.

DoRAN is a parameter-efficient fine-tuning (PEFT) method for large-scale foundation models that augments the Weight-Decomposed Low-Rank Adaptation (DoRA) approach through adaptive noise injection and auxiliary (hyper) networks for dynamic low-rank parameter generation. DoRAN specifically addresses stability and sample efficiency issues inherent in prior low-rank adaptation methods such as LoRA and DoRA, demonstrating improved empirical performance across both vision and language tasks.

1. Foundations: PEFT, LoRA, and DoRA

Parameter-efficient fine-tuning (PEFT) enables adaptation of overparameterized models with minimal trainable parameters by introducing structured, low-rank updates—typically without modifying the core (pre-trained) weights. LoRA modifies the original weight W0W_0 of a layer via: W=W0+BAW = W_0 + BA where BB and AA are low-rank matrices with trainable parameters. DoRA advances this by explicitly decomposing the weight into magnitude and directional components: W=m(W0+BA)/W0+BAW = m \cdot (W_0 + BA) / \|W_0 + BA\| where mm is a (learnable) scaling vector and the denominator normalizes the update direction, aiming to better approximate full fine-tuning dynamics.

However, two primary limitations are observed in DoRA:

  • The normalization denominator W0+BA\|W_0 + BA\| may become small, resulting in gradient instability (potentially exploding gradients).
  • The use of layer-local, static low-rank matrices can restrict sample efficiency and prevent sharing of adaptation information across layers.

2. DoRAN Core Algorithm: Stabilization and Network-Based Parameterization

DoRAN introduces two central modifications:

2.1 Noise Injection and Adaptive Regularization

To stabilize normalization, DoRAN adds a learnable positive regularizer τ\tau to the denominator: W=m(W0+BA)/(W0+BA+τ)W = m \cdot (W_0 + BA) / (\|W_0 + BA\| + \tau) τR+\tau \in \mathbb{R}^+ serves as an adaptive noise buffer, reducing sensitivity to near-zero norms:

  • If τ\tau is small, DoRAN approaches DoRA-like behavior, primarily learning directional updates.
  • For large τ\tau, normalization is relaxed, closer to the unnormalized update regime of LoRA.

This controlled interpolation manages the gradient's parallel and orthogonal components (with respect to W0W_0), guarding against vanishing denominators and providing stable learning dynamics as formalized in the paper’s gradient analysis.

2.2 Auxiliary Networks for Dynamic Low-Rank Generation

DoRAN replaces per-layer static low-rank matrices with small auxiliary feedforward networks (hypernetworks) g1g_1 and g2g_2, mapping a shared latent embedding ee to low-rank factors: B=g1(e),A=g2(e)B = g_1(e), \quad A = g_2(e) Consequently, the low-rank update in each layer is generated dynamically, with shared parameters enabling coupling of adaptation information across layers and attention heads. This structural coupling promotes greater sample efficiency, especially under data scarcity, while retaining model expressiveness.

3. Mathematical Formulation and Gradient Behavior

The full DoRAN update for a linear (or affine) layer is: W=m(W0+g2(e)g1(e))/(W0+g2(e)g1(e)+τ)W = m \cdot (W_0 + g_2(e)g_1(e)) / (\|W_0 + g_2(e)g_1(e)\| + \tau)

Gradient analysis reveals:

  • The update decomposes into parallel (norm scaling) and orthogonal (directional) contributions.
  • τ\tau adaptively regularizes both components, preventing instability and facilitating robust learning.

4. Experimental Evaluation

Benchmarking covers both vision and language adaptation scenarios.

4.1 Vision: VTAB-1K and FGVC

  • DoRAN is instantiated atop a ViT-B/16 backbone pre-trained on ImageNet-21K.
  • On the VTAB-1K benchmark, adding only the stabilizing τ\tau ("τ\tau–DoRA") increases average accuracy by \sim0.5% over DoRA, while inclusion of auxiliary networks yields up to 1.8% gains.
  • Fine-grained visual categorization (FGVC) shows similar improvements with minimal parameter overhead (roughly 0.09% additional trainable parameters compared to DoRA).

4.2 Language: Commonsense Reasoning

  • Tasks encompass eight commonsense benchmarks (e.g., BoolQ, PIQA, HellaSwag, ARC-c) with LLaMA-7B and LLaMA-13B.
  • DoRAN surpasses LoRA and DoRA by 1–2% in accuracy and demonstrates substantially improved sample efficiency, particularly in low-data regimes.

5. Theoretical and Practical Implications

  • Noise injection via τ\tau offers tunable regularization, interpolating stably between LoRA and DoRA behaviors.
  • Auxiliary network parameterization enables cross-layer sharing and enhances data efficiency, facilitating robust adaptation with minimal added compute or parameters.
  • DoRAN’s construction allows theoretically controlled trade-offs between magnitude and direction adaptation, bridging the rigidity of DoRA with the scale flexibility of LoRA.

A plausible implication is that DoRAN’s two-stage approach—adaptive regularization and parameter-sharing via networks—will generalize to other architectural modalities (e.g., multimodal transformers) and distributed/federated fine-tuning settings, especially where sample efficiency is paramount.

6. Limitations and Open Directions

While DoRAN incurs negligible additional parameter cost, the introduction of auxiliary networks brings extra architectural choices and hyperparameters. Careful design and tuning may be necessary for optimal cross-layer coupling and stability in diverse foundation model classes.

Potential avenues for future inquiry include:

  • Precise characterization of optimal τ\tau scheduling during training.
  • Extensions to recurrent, graph, or multimodal model families.
  • More advanced network architectures for low-rank generation beyond simple feedforward models.
  • Analytical studies of generalization and expressivity, especially in low-resource adaptation.

7. Summary Table: Core Differences

Method Stability Mechanism Low-Rank Generation
LoRA None; direct update Per-layer static matrices
DoRA Normalization; no τ\tau Per-layer static matrices
DoRAN Adaptive τ\tau regularizer Auxiliary networks (shared)

DoRAN thus emerges as a robust, efficient, and theoretically principled PEFT method, combining adaptive normalization and parameter-sharing architectures to yield superior fine-tuning behavior and sample efficiency, as validated on multiple vision and language domains (Diep et al., 5 Oct 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to DoRAN.