OrthoMerge: Orthogonal Model Merging

Updated 6 May 2026

OrthoMerge is a family of methods that use explicit orthogonality in parameter, data, or transformation spaces to mitigate destructive interference when merging task-specific model adaptations.
It leverages techniques like DO-Merging, OSRM, and Lie-Manifold Merge to decouple magnitude from direction, enforce subspace orthogonality, and preserve geometric structure.
Empirical evaluations across vision, language, and multi-modal benchmarks demonstrate significant accuracy gains and reduced merge loss compared to traditional model merging approaches.

OrthoMerge is a family of methods for model merging that employ explicit orthogonality—in parameter space, data space, or transformation space—to mitigate destructive interference when combining task-specific model adaptations. It is particularly relevant for merging low-rank adaptation (LoRA) modules and orthogonally fine-tuned models, and encompasses several distinct but related frameworks with rigorous theoretical underpinnings and broad empirical validation (Zheng et al., 21 May 2025, Yang et al., 5 Feb 2026, Zhang et al., 28 May 2025).

1. Motivation and Problem Setting

Model merging seeks to integrate multiple specialized models into a single unified set of weights, reducing deployment, training, and inference costs. Conventional approaches such as Task Arithmetic (simple averaging or weighted sum of parameter deltas) are effective for fully fine-tuned models but fail for LoRA or other structured adaptation methods, resulting in performance degradation (Zheng et al., 21 May 2025, Zhang et al., 28 May 2025). The underlying reasons include:

High column-wise magnitude variance in LoRA modules causing dominance by a single task
Interference between subspaces associated with different tasks, particularly when their supports overlap
Neglect of geometric properties such as preservation of hyperspherical energy or orthogonality inherent to some fine-tuning schemes

OrthoMerge addresses these pitfalls by bringing task decoupling, orthogonality, and geometric structure preservation into merging operations.

2. Theoretical Principles and Frameworks

Decoupling Magnitude and Direction (DO-Merging)

For LoRA, the weight update for task $i$ at a given layer is $\Delta W_i = B_i A_i$ , where $B_i \in \mathbb{R}^{m \times r}, A_i \in \mathbb{R}^{r \times n}$ . DO-Merging decomposes each $\Delta W_i$ into:

Magnitude vector $\alpha_i[j] = \|(\Delta W_i)_{:,j}\|_2$ , $j=1\ldots n$
Direction matrix $\bar W_i$ with columns normalized by $\alpha_i$

Each update can thus be written as $\Delta W_i = \text{Diag}(\alpha_i) \cdot \bar W_i$ . By decoupling, cross-task magnitude variance is isolated from mixing of directionality, preventing parameter dominance and information loss (Zheng et al., 21 May 2025).

Orthogonality Constraints

To further reduce interference, OrthoMerge employs layer-wise orthogonalization of direction matrices. For tasks $i, j$ , directions $\Delta W_i = B_i A_i$ 0 are adjusted via small perturbations $\Delta W_i = B_i A_i$ 1, minimizing

$\Delta W_i = B_i A_i$ 2

This penalizes overlap between task-specific updates and is performed data-free, i.e., without input samples. Theoretical results guarantee reduction of merge loss (expected performance drop) due to magnitude variance and task conflict (Zheng et al., 21 May 2025).

Orthogonal-Subspace Preconditioning

Orthogonal Subspaces for Robust Model Merging (OSRM) constrains the row-space of each LoRA module $\Delta W_i = B_i A_i$ 3 before fine-tuning, ensuring the latent features of all other tasks are orthogonal to $\Delta W_i = B_i A_i$ 4. For task $\Delta W_i = B_i A_i$ 5:

The LoRA “input” matrix $\Delta W_i = B_i A_i$ 6 is initialized as the bottom- $\Delta W_i = B_i A_i$ 7 eigenvectors of the covariance $\Delta W_i = B_i A_i$ 8 of out-of-task latent features at layer $\Delta W_i = B_i A_i$ 9.
The objective enforces $B_i \in \mathbb{R}^{m \times r}, A_i \in \mathbb{R}^{r \times n}$ 0, systematically suppressing inter-task crosstalk in parameter updates (Zhang et al., 28 May 2025).

Geometric Manifold Merging via Lie Theory

When merging models fine-tuned by Orthogonal Finetuning (OFT), each adaptation is represented as an orthogonal matrix $B_i \in \mathbb{R}^{m \times r}, A_i \in \mathbb{R}^{r \times n}$ 1, with merged adaptation $B_i \in \mathbb{R}^{m \times r}, A_i \in \mathbb{R}^{r \times n}$ 2 constructed on the orthogonal group manifold:

Map $B_i \in \mathbb{R}^{m \times r}, A_i \in \mathbb{R}^{r \times n}$ 3 (Lie algebra)
Merge: $B_i \in \mathbb{R}^{m \times r}, A_i \in \mathbb{R}^{r \times n}$ 4 in algebra
Map back: $B_i \in \mathbb{R}^{m \times r}, A_i \in \mathbb{R}^{r \times n}$ 5

This approach exactly preserves geometric properties such as norm and inner product, preventing spectral-norm drift and hyperspherical energy loss (Yang et al., 5 Feb 2026).

For general finetuned weights, the orthogonal Procrustes problem extracts the closest orthogonal matrix $B_i \in \mathbb{R}^{m \times r}, A_i \in \mathbb{R}^{r \times n}$ 6 to an update, with the residual handled by standard merging.

3. Merging Algorithms and Implementation

Three principal OrthoMerge algorithms are instantiated from these principles:

Method	Key Step	Where Applied
DO-Merging	Decouple magnitude and direction, orthogonalize	LoRA module merging (post hoc, data-free)
OSRM	Orthogonalize LoRA subspace (pre fine-tuning)	LoRA module merging (pre-finetuning, data-driven)
Lie-Manifold Merge	Projection to Lie algebra, merge, map back	Orthogonal Finetuning, general finetuned adapters

DO-Merging Algorithm (Zheng et al., 21 May 2025):

For each layer and task, compute $B_i \in \mathbb{R}^{m \times r}, A_i \in \mathbb{R}^{r \times n}$ 7, decompose into magnitude and direction, orthogonalize directions via small $B_i \in \mathbb{R}^{m \times r}, A_i \in \mathbb{R}^{r \times n}$ 8 using gradient descent, sum magnitudes and orthogonalized directions, reconstruct $B_i \in \mathbb{R}^{m \times r}, A_i \in \mathbb{R}^{r \times n}$ 9 and merge with base weights.

OSRM Procedure (Zhang et al., 28 May 2025):

Before fine-tuning, collect latent features per task; for each task and layer, initialize $\Delta W_i$ 0 to bottom eigenvectors of covariance of all other tasks' features. Fine-tune as usual. Merge LoRA adapters with any standard technique (e.g., Task Arithmetic, Fisher, RegMean).

Orthogonal Model Merging (Yang et al., 5 Feb 2026):

Given OFT-trained models, map orthogonal weight updates to Lie algebra, average, map back to the group. For general adapters, extract orthogonal part via SVD; residuals are merged additively.

4. Empirical Evaluation and Performance

Experimental studies validate OrthoMerge approaches across vision, language, and multi-modal domains:

DO-Merging (Zheng et al., 21 May 2025):
- Vision (ViT-B/32, 8 tasks): Task Arithmetic 74.06%, DO-Merging 77.88% (+3.82%)
- Medium NLP (T5-base, 8 tasks): Task Arithmetic 77.4%, DO-Merging 80.9% (+3.5%)
- Large LLMs (LLaMa3-8B, 6 tasks): Task Arithmetic 83.55%, DO-Merging 87.11% (+3.56%)
- Orthogonalization alone yields ~2%, decoupling alone ~1%, combined ~3% normalized accuracy gains
OSRM (Zhang et al., 28 May 2025):
- On GLUE with RoBERTa-large: Task Arithmetic +6.6pp, RegMean +1.9pp, Fisher +7.0pp, TIES +5.3pp, EMR +2.1pp improvement on average
- Robust to hyperparameters: merge scaling, number of latent features $\Delta W_i$ 1, number of tasks $\Delta W_i$ 2, type of LoRA block
Orthogonal Model Merging (Yang et al., 5 Feb 2026):
- When merging OFT models, in-domain accuracy: OrthoMerge 46.25% vs. baselines 44.10–44.97%; out-of-domain: OrthoMerge 41.80% vs. 40.78–40.97%
- When applied to general adapters via Orthogonal-Residual Decoupling, consistently boosts all baselines by 0.2–2.4 points
- Exact preservation of hyperspherical energy; mitigates catastrophic forgetting

5. Integration and Practical Considerations

OrthoMerge algorithms are designed to integrate seamlessly into existing merging pipelines:

Plug-and-play: OSRM and DO-Merging can be applied with no modifications to post-hoc merging code; OSRM is pre-finetuning, DO-Merging is post-finetuning, data-free.
Computational cost: DO-Merging requires only $\Delta W_i$ 3 inner products per layer ( $\Delta W_i$ 4, $\Delta W_i$ 5 tasks), incurring $\Delta W_i$ 6 GPU-minute overhead for $\Delta W_i$ 7-layer models. OSRM costs one eigendecomposition per layer per task.
Scalability: Each method is robust to number of tasks and features; OSRM in particular maintains high performance for $\Delta W_i$ 8 tasks.
Hyperparameters: For LoRA, common settings apply, e.g., rank $\Delta W_i$ 9– $\alpha_i[j] = \|(\Delta W_i)_{:,j}\|_2$ 0, $\alpha_i[j] = \|(\Delta W_i)_{:,j}\|_2$ 1 features per task; $\alpha_i[j] = \|(\Delta W_i)_{:,j}\|_2$ 2 can be strictly orthogonal or softly constrained via fine-tuning.

6. Theoretical Guarantees and Analysis

Magnitude imbalance: Merge loss is minimized when merged LoRA modules have matched magnitude vectors ( $\alpha_i[j] = \|(\Delta W_i)_{:,j}\|_2$ 3) (Zheng et al., 21 May 2025).
Benefit of decoupling: The expected merge loss $\alpha_i[j] = \|(\Delta W_i)_{:,j}\|_2$ 4 is lower for decoupled-then-merged updates than naive linear merge when magnitudes differ.
Orthogonality reduces conflict: Stricter orthogonality between direction matrices reduces “sign conflicts,” minimizing destructive interference and preserving task-specific signal (Zheng et al., 21 May 2025, Zhang et al., 28 May 2025).
Group manifold averaging: For OFT models, Riemannian averaging via Lie algebra preserves norm and rotation, ensuring valid merged adaptors (Yang et al., 5 Feb 2026).

A frequent misconception is that parameter-space orthogonality between LoRA deltas $\alpha_i[j] = \|(\Delta W_i)_{:,j}\|_2$ 5 suffices to prevent task interference. However, unless the data features for different tasks are taken into account, latent cross-talk persists at inference because parameter-space separation does not guarantee output-space orthogonality. Data-driven or geometric orthogonality (as in OSRM and manifold-based OrthoMerge) is necessary for robust interference suppression (Zhang et al., 28 May 2025, Yang et al., 5 Feb 2026).

OrthoMerge is distinct from pre-merging methods that rely solely on pruning or clustering, and from model soups that apply linear combinations without structure-awareness. Its key contribution is the explicit management of both algebraic and geometric subspace overlap in the merging process.

References:

"Decouple and Orthogonalize: A Data-Free Framework for LoRA Merging" (Zheng et al., 21 May 2025)
"Orthogonal Model Merging" (Yang et al., 5 Feb 2026)
"Unraveling LoRA Interference: Orthogonal Subspaces for Robust Model Merging" (Zhang et al., 28 May 2025)

Markdown Report Issue Upgrade to Chat

References (3)

Decouple and Orthogonalize: A Data-Free Framework for LoRA Merging (2025)

Orthogonal Model Merging (2026)

Unraveling LoRA Interference: Orthogonal Subspaces for Robust Model Merging (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to OrthoMerge.

OrthoMerge: Orthogonal Model Merging

1. Motivation and Problem Setting

2. Theoretical Principles and Frameworks

Decoupling Magnitude and Direction (DO-Merging)

Orthogonality Constraints

Orthogonal-Subspace Preconditioning

Geometric Manifold Merging via Lie Theory

3. Merging Algorithms and Implementation

4. Empirical Evaluation and Performance

5. Integration and Practical Considerations

6. Theoretical Guarantees and Analysis

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

OrthoMerge: Orthogonal Model Merging

1. Motivation and Problem Setting

2. Theoretical Principles and Frameworks

Decoupling Magnitude and Direction (DO-Merging)

Orthogonality Constraints

Orthogonal-Subspace Preconditioning

Geometric Manifold Merging via Lie Theory

3. Merging Algorithms and Implementation

4. Empirical Evaluation and Performance

5. Integration and Practical Considerations

6. Theoretical Guarantees and Analysis

7. Related Approaches and Common Misconceptions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research