OrthoMerge: Orthogonal Model Merging
- OrthoMerge is a family of methods that use explicit orthogonality in parameter, data, or transformation spaces to mitigate destructive interference when merging task-specific model adaptations.
- It leverages techniques like DO-Merging, OSRM, and Lie-Manifold Merge to decouple magnitude from direction, enforce subspace orthogonality, and preserve geometric structure.
- Empirical evaluations across vision, language, and multi-modal benchmarks demonstrate significant accuracy gains and reduced merge loss compared to traditional model merging approaches.
OrthoMerge is a family of methods for model merging that employ explicit orthogonality—in parameter space, data space, or transformation space—to mitigate destructive interference when combining task-specific model adaptations. It is particularly relevant for merging low-rank adaptation (LoRA) modules and orthogonally fine-tuned models, and encompasses several distinct but related frameworks with rigorous theoretical underpinnings and broad empirical validation (Zheng et al., 21 May 2025, Yang et al., 5 Feb 2026, Zhang et al., 28 May 2025).
1. Motivation and Problem Setting
Model merging seeks to integrate multiple specialized models into a single unified set of weights, reducing deployment, training, and inference costs. Conventional approaches such as Task Arithmetic (simple averaging or weighted sum of parameter deltas) are effective for fully fine-tuned models but fail for LoRA or other structured adaptation methods, resulting in performance degradation (Zheng et al., 21 May 2025, Zhang et al., 28 May 2025). The underlying reasons include:
- High column-wise magnitude variance in LoRA modules causing dominance by a single task
- Interference between subspaces associated with different tasks, particularly when their supports overlap
- Neglect of geometric properties such as preservation of hyperspherical energy or orthogonality inherent to some fine-tuning schemes
OrthoMerge addresses these pitfalls by bringing task decoupling, orthogonality, and geometric structure preservation into merging operations.
2. Theoretical Principles and Frameworks
Decoupling Magnitude and Direction (DO-Merging)
For LoRA, the weight update for task at a given layer is , where . DO-Merging decomposes each into:
- Magnitude vector ,
- Direction matrix with columns normalized by
Each update can thus be written as . By decoupling, cross-task magnitude variance is isolated from mixing of directionality, preventing parameter dominance and information loss (Zheng et al., 21 May 2025).
Orthogonality Constraints
To further reduce interference, OrthoMerge employs layer-wise orthogonalization of direction matrices. For tasks , directions 0 are adjusted via small perturbations 1, minimizing
2
This penalizes overlap between task-specific updates and is performed data-free, i.e., without input samples. Theoretical results guarantee reduction of merge loss (expected performance drop) due to magnitude variance and task conflict (Zheng et al., 21 May 2025).
Orthogonal-Subspace Preconditioning
Orthogonal Subspaces for Robust Model Merging (OSRM) constrains the row-space of each LoRA module 3 before fine-tuning, ensuring the latent features of all other tasks are orthogonal to 4. For task 5:
- The LoRA “input” matrix 6 is initialized as the bottom-7 eigenvectors of the covariance 8 of out-of-task latent features at layer 9.
- The objective enforces 0, systematically suppressing inter-task crosstalk in parameter updates (Zhang et al., 28 May 2025).
Geometric Manifold Merging via Lie Theory
When merging models fine-tuned by Orthogonal Finetuning (OFT), each adaptation is represented as an orthogonal matrix 1, with merged adaptation 2 constructed on the orthogonal group manifold:
- Map 3 (Lie algebra)
- Merge: 4 in algebra
- Map back: 5
This approach exactly preserves geometric properties such as norm and inner product, preventing spectral-norm drift and hyperspherical energy loss (Yang et al., 5 Feb 2026).
For general finetuned weights, the orthogonal Procrustes problem extracts the closest orthogonal matrix 6 to an update, with the residual handled by standard merging.
3. Merging Algorithms and Implementation
Three principal OrthoMerge algorithms are instantiated from these principles:
| Method | Key Step | Where Applied |
|---|---|---|
| DO-Merging | Decouple magnitude and direction, orthogonalize | LoRA module merging (post hoc, data-free) |
| OSRM | Orthogonalize LoRA subspace (pre fine-tuning) | LoRA module merging (pre-finetuning, data-driven) |
| Lie-Manifold Merge | Projection to Lie algebra, merge, map back | Orthogonal Finetuning, general finetuned adapters |
DO-Merging Algorithm (Zheng et al., 21 May 2025):
For each layer and task, compute 7, decompose into magnitude and direction, orthogonalize directions via small 8 using gradient descent, sum magnitudes and orthogonalized directions, reconstruct 9 and merge with base weights.
OSRM Procedure (Zhang et al., 28 May 2025):
Before fine-tuning, collect latent features per task; for each task and layer, initialize 0 to bottom eigenvectors of covariance of all other tasks' features. Fine-tune as usual. Merge LoRA adapters with any standard technique (e.g., Task Arithmetic, Fisher, RegMean).
Orthogonal Model Merging (Yang et al., 5 Feb 2026):
Given OFT-trained models, map orthogonal weight updates to Lie algebra, average, map back to the group. For general adapters, extract orthogonal part via SVD; residuals are merged additively.
4. Empirical Evaluation and Performance
Experimental studies validate OrthoMerge approaches across vision, language, and multi-modal domains:
- DO-Merging (Zheng et al., 21 May 2025):
- Vision (ViT-B/32, 8 tasks): Task Arithmetic 74.06%, DO-Merging 77.88% (+3.82%)
- Medium NLP (T5-base, 8 tasks): Task Arithmetic 77.4%, DO-Merging 80.9% (+3.5%)
- Large LLMs (LLaMa3-8B, 6 tasks): Task Arithmetic 83.55%, DO-Merging 87.11% (+3.56%)
- Orthogonalization alone yields ~2%, decoupling alone ~1%, combined ~3% normalized accuracy gains
- OSRM (Zhang et al., 28 May 2025):
- On GLUE with RoBERTa-large: Task Arithmetic +6.6pp, RegMean +1.9pp, Fisher +7.0pp, TIES +5.3pp, EMR +2.1pp improvement on average
- Robust to hyperparameters: merge scaling, number of latent features 1, number of tasks 2, type of LoRA block
- Orthogonal Model Merging (Yang et al., 5 Feb 2026):
- When merging OFT models, in-domain accuracy: OrthoMerge 46.25% vs. baselines 44.10–44.97%; out-of-domain: OrthoMerge 41.80% vs. 40.78–40.97%
- When applied to general adapters via Orthogonal-Residual Decoupling, consistently boosts all baselines by 0.2–2.4 points
- Exact preservation of hyperspherical energy; mitigates catastrophic forgetting
5. Integration and Practical Considerations
OrthoMerge algorithms are designed to integrate seamlessly into existing merging pipelines:
- Plug-and-play: OSRM and DO-Merging can be applied with no modifications to post-hoc merging code; OSRM is pre-finetuning, DO-Merging is post-finetuning, data-free.
- Computational cost: DO-Merging requires only 3 inner products per layer (4, 5 tasks), incurring 6 GPU-minute overhead for 7-layer models. OSRM costs one eigendecomposition per layer per task.
- Scalability: Each method is robust to number of tasks and features; OSRM in particular maintains high performance for 8 tasks.
- Hyperparameters: For LoRA, common settings apply, e.g., rank 9–0, 1 features per task; 2 can be strictly orthogonal or softly constrained via fine-tuning.
6. Theoretical Guarantees and Analysis
- Magnitude imbalance: Merge loss is minimized when merged LoRA modules have matched magnitude vectors (3) (Zheng et al., 21 May 2025).
- Benefit of decoupling: The expected merge loss 4 is lower for decoupled-then-merged updates than naive linear merge when magnitudes differ.
- Orthogonality reduces conflict: Stricter orthogonality between direction matrices reduces “sign conflicts,” minimizing destructive interference and preserving task-specific signal (Zheng et al., 21 May 2025, Zhang et al., 28 May 2025).
- Group manifold averaging: For OFT models, Riemannian averaging via Lie algebra preserves norm and rotation, ensuring valid merged adaptors (Yang et al., 5 Feb 2026).
7. Related Approaches and Common Misconceptions
A frequent misconception is that parameter-space orthogonality between LoRA deltas 5 suffices to prevent task interference. However, unless the data features for different tasks are taken into account, latent cross-talk persists at inference because parameter-space separation does not guarantee output-space orthogonality. Data-driven or geometric orthogonality (as in OSRM and manifold-based OrthoMerge) is necessary for robust interference suppression (Zhang et al., 28 May 2025, Yang et al., 5 Feb 2026).
OrthoMerge is distinct from pre-merging methods that rely solely on pruning or clustering, and from model soups that apply linear combinations without structure-awareness. Its key contribution is the explicit management of both algebraic and geometric subspace overlap in the merging process.
References:
- "Decouple and Orthogonalize: A Data-Free Framework for LoRA Merging" (Zheng et al., 21 May 2025)
- "Orthogonal Model Merging" (Yang et al., 5 Feb 2026)
- "Unraveling LoRA Interference: Orthogonal Subspaces for Robust Model Merging" (Zhang et al., 28 May 2025)