SingLoRA: Symmetric Low-Rank Adaptation

Updated 9 July 2025

SingLoRA is a parameter-efficient fine-tuning technique that replaces dual low-rank matrices with a single symmetric update, addressing instability in classical methods.
It employs a ramp-up function and symmetric A Aᵀ update, halving the number of trainable parameters while maintaining stable optimization across large-width models.
Empirical results show SingLoRA outperforms traditional LoRA techniques in language and vision tasks, achieving higher accuracy with reduced computational overhead.

SingLoRA is a parameter-efficient fine-tuning method for large-scale neural networks that modifies the architecture of low-rank adaptation by learning weight updates with a single low-rank matrix and its transpose, rather than the standard product of two distinct low-rank matrices. This design addresses instability and over-parameterization issues that commonly arise in classical Low-Rank Adaptation (LoRA) schemes, providing guaranteed stability in large-width regimes and empirically better accuracy with reduced parameter budgets across language understanding and generative modeling tasks (Bensaïd et al., 8 Jul 2025).

1. Reformulation of Low-Rank Adaptation

Traditional LoRA updates a frozen pretrained weight matrix $W_0 \in \mathbb{R}^{d \times k}$ by the product of two trainable low-rank matrices, $B \in \mathbb{R}^{d \times r}$ and $A \in \mathbb{R}^{r \times k}$ , so that $W = W_0 + BA$ (with $r \ll d, k$ ). Recent findings have shown that mismatched scaling between $A$ and $B$ often leads to unstable optimization: the learning dynamics of each matrix can interfere due to divergent parameter magnitudes, especially as model width grows.

SingLoRA proposes a symmetric update that replaces $BA$ with $A A^\top$ , so the adapted model weights take the form:

$W = W_0 + \frac{\alpha}{r} \cdot u(t) \cdot A A^\top$

where $A \in \mathbb{R}^{n \times r}$ is the only trainable matrix, $u(t)$ is a ramp function (typically $u(t) = \min(t/T, 1)$ over training steps $t$ and ramp period $T$ ), and $\alpha$ is a scaling hyperparameter. This symmetric construction inherently sidesteps inter-matrix scale conflicts by learning a single parameter matrix.

2. Theoretical Properties and Infinite-Width Analysis

A rigorous analysis in the infinite-width regime demonstrates that SingLoRA’s parameterization ensures stable feature learning by construction. In detail, by adopting scaling rules where the entries of $A$ are initialized (and maintained) at order $\Theta(n^{-1/2})$ (with appropriate learning rates), the symmetric update $A A^\top$ preserves output magnitudes at $\Theta(1)$ as the network width $n \to \infty$ .

This eliminates the need for separate learning rate tuning for two matrices and avoids vanishing or exploding gradients—a problem long observed in classical LoRA-based and two-matrix schemes. Consequently, the optimization dynamics remain stable across width scales and throughout training.

3. Methodological Details and Implementation

The update rule in SingLoRA replaces LoRA’s two-matrix structure with a single trainable $A$ :

Initialization: $A$ is initialized with entries $\mathcal{N}(0, n^{-1/2})$ . No second matrix is required.
Updating: At each training step, the update to $W$ is

$W \leftarrow W_0 + \frac{\alpha}{r} \cdot u(t) \cdot A A^\top$

where $u(t)$ ramps up linearly from $0$ to $1$ over a warm-up period.

The single symmetric update requires approximately half as many trainable parameters for the same rank $r$ as LoRA and its variants, resulting in reduced memory consumption and potentially smaller communication overhead during distributed fine-tuning. The ramp function $u(t)$ is employed to gradually introduce the low-rank update, further stabilizing early-stage dynamics.

4. Empirical Evaluation and Performance

SingLoRA was validated on multiple tasks across NLP and computer vision:

Language Understanding: Fine-tuning RoBERTa-base and GPT-2 on GLUE benchmarks (MNLI, QQP, QNLI) showed mean accuracy improvements of approximately $0.9\%$ for RoBERTa and $1.1\%$ for GPT-2 relative to baseline LoRA, all while using only about $50\%$ of the trainable parameters.
Large-Scale LLMs: When applied to LLaMA-7B fine-tuned on MNLI, SingLoRA achieved $91.3\%$ accuracy, outperforming LoRA ( $89.1\%$ ), LoRA+ ( $90.2\%$ ), and DoRA, again at $60\%$ of the parameter budget.
Image Generation: In DreamBooth fine-tuning with Stable Diffusion V1.5, SingLoRA improved the DINO similarity score—an image fidelity metric—achieving $0.151$ compared to $0.148$ for DoRA and $0.143$ for LoRA, and preserved prompt alignment as measured by CLIP text similarity.

These results indicate that SingLoRA matches or surpasses existing parameter-efficient adaptation techniques on common benchmarks in both domains.

5. Applications and Use Cases

SingLoRA is suitable for any scenario where LoRA-style adaptation is beneficial but memory and compute efficiency are critical:

Parameter-Efficient Fine-Tuning of LLMs: By halving the number of adaptation parameters and stabilizing learning, SingLoRA enables more resource-efficient deployment of large models, especially in multi-task and multi-domain settings.
Diffusion Models for Image Generation: Its symmetric adaptation proves effective for high-fidelity personalization tasks such as subject-driven generation (DreamBooth), where maintaining subject details and fidelity is challenging for conventional LoRA methods.

A plausible implication is that the symmetric structure of SingLoRA’s update could facilitate new model compression and deployment strategies in constrained or on-device environments.

6. Practical Implications, Limitations, and Outlook

The adoption of Symmetric Low-Rank Adaptation via SingLoRA offers several practical advantages:

Reduced Parameter Budget: Fewer parameters reduce memory load and may decrease distributed training communication costs.
Stable Hyperparameter Tuning: Single-matrix adaptation eliminates the need to hand-tune scale or learning rates between two matrices.
Empirical Robustness: Improved and stable training dynamics translate to better outcomes across tasks without custom schedules or optimization tricks.

Potential limitations include the inherent expressiveness constraints of a symmetric update; scenarios requiring non-symmetric adaptation may still benefit from alternative or composite methods such as DoRA or LoRA+. The empirical results indicate strong performance for $r \ll d$ and $n$ , but further studies on very shallow or specialized architectures are warranted.

Future work may explore hybrid schemes that combine SingLoRA with advanced adaptation modules, ablation of ramp-up strategies, or application to non-standard architectures (multi-modal or recurrent layers). Investigation into theoretical properties beyond the infinite-width regime, as well as hyperparameter sensitivity in resource-constrained deployment, are also promising avenues.

7. Summary Table: SingLoRA vs. Conventional LoRA

Method	Update Parameterization	# Trainable Params	Reported Accuracy (MNLI, LLaMA-7B)	DINO Score (DreamBooth SD)
LoRA	$W_0 + BA$	$2 n r$	$89.1\%$	$0.143$
LoRA+	Enhanced $BA$ with normalization, etc.	$2 n r$	$90.2\%$	N/A
DoRA	Decorrelated rank adaptation	Varies	N/A	$0.148$
SingLoRA	$W_0 + \frac{\alpha}{r} u(t) A A^\top$	$n r$	$91.3\%$	$0.151$

In conclusion, SingLoRA represents a theoretically justified, empirically validated, and methodologically simplified means of parameter-efficient adaptation for large neural networks, exploiting a symmetric single-matrix formulation to guarantee stability while reducing the adaptation parameter count and improving empirical accuracy across NLP and computer vision tasks (Bensaïd et al., 8 Jul 2025).

PDF Markdown Chat (Pro)

References (1)

SingLoRA: Low Rank Adaptation Using a Single Matrix (2025)

Follow Topic

Get notified by email when new papers are published related to SingLoRA.