Orthogonal Iterative Merging (OIM)

Updated 30 June 2026

Orthogonal Iterative Merging (OIM) is a geometry-aware algorithm that sequentially integrates multiple task-specific neural network updates while preserving their uniqueness.
It employs iterative orthogonalization methods, such as Gram–Schmidt and SVD-based truncation, to ensure task deltas remain mutually orthogonal and reduce interference.
Empirical results across vision, language, and multimodal benchmarks demonstrate improved accuracy, reversibility, and efficient continual model integration with minimal overhead.

Orthogonal Iterative Merging (OIM) is a class of training‐free, geometry-aware algorithms that sequentially integrate a collection of fine-tuned or specialized models into a single unified model. OIM enforces mutual orthogonality of task-specific updates in parameter space, thereby minimizing interference and catastrophic forgetting when merging knowledge from diverse tasks. Contemporary OIM methods are prominent in continual learning, model merging, and modular systems for deep neural networks, exhibiting empirical gains in accuracy, stability, and reversibility across a range of architectures and benchmark scenarios (Tang et al., 16 Jan 2025, Khan et al., 28 Jul 2025, Yang et al., 5 Feb 2026, Shah et al., 2024).

1. Mathematical and Geometric Foundations

OIM is rooted in orthogonalization processes and projections in high-dimensional parameter spaces. For neural models with parameter vector $\theta_0 \in \mathbb{R}^d$ (the shared base), independently fine-tuned models $\theta_i$ ( $i=1,\ldots,N$ ) define "task deltas" $\Delta_i = \theta_i - \theta_0$ . Orthogonal iterative merging enforces $\langle\Delta_i^\perp,\Delta_j^\perp\rangle=0$ for $i\neq j$ , so each merged task's update lies outside the subspace of previously integrated deltas.

For specialized settings where the model weights are linear operators, OIM extends to the orthogonal group $O(n)=\{O\in\mathbb{R}^{n\times n}:O^\top O=I\}$ , and uses the associated Lie algebra so $(n)$ of skew-symmetric matrices. In this regime, merging operations are performed on the Riemannian manifold of the orthogonal group, preserving geometric structure of the weight tensors and enabling principled manifold-aware integration of orthogonal adaptations (Yang et al., 5 Feb 2026).

2. Core OIM Algorithms

The generic OIM workflow is as follows:

Orthogonalization: Each task delta $\Delta_i$ is projected onto the orthogonal complement of the subspace spanned by previous deltas, typically via Gram–Schmidt:

$\Delta_i^\perp = \Delta_i - \sum_{j=1}^{i-1} \frac{\langle\Delta_i, \Delta_j^\perp\rangle}{\|\Delta_j^\perp\|^2} \Delta_j^\perp.$

High-dimensional variants may employ SVD-based truncation (Tang et al., 16 Jan 2025) or block orthogonalization (Khan et al., 28 Jul 2025).

Iterative merging: The merged parameters are formed by adding all orthogonalized deltas to the base, with optional learnable combination coefficients to balance performance:

$\theta_i$ 0

with the $\theta_i$ 1 typically learned by minimizing an aggregate objective across task losses with a consistency penalty (Khan et al., 28 Jul 2025).

Gradient-based refinement: OIM optionally incorporates a further stage of joint optimization over $\theta_i$ 2 to directly minimize the sum of per-task validation losses plus a quadratic regularization towards the naive sum-of-orthogonally-merged deltas.
Continual and modular update: New tasks can be added incrementally by projecting their delta against the current orthogonal set and repeating the merge, without revisiting or re-projecting previous deltas. Removal ("unmerge") of a task is achieved by subtracting its orthogonalized delta, ensuring minimal collateral effect on other tasks (Khan et al., 28 Jul 2025).

For model weights representable as orthogonal matrices (e.g., after Orthogonal Fine-Tuning, OFT), merging is done on the orthogonal group via Lie algebra averaging. Each expert model's orthogonal update $\theta_i$ 3 is mapped to so $\theta_i$ 4 by $\theta_i$ 5, averaged as $\theta_i$ 6, optionally norm-corrected, and mapped back to $\theta_i$ 7 by $\theta_i$ 8. If models are not OFT-trained, OIM applies the orthogonal Procrustes solution to extract the nearest orthogonal component and merges the residual in weight space (Yang et al., 5 Feb 2026).

3. Theoretical Properties and Analysis

A primary property of OIM is provable minimization of task interference. By construction, new task deltas are orthogonal in parameter space to those of previous tasks,

$\theta_i$ 9

which, under quadratic or locally linearized loss, ensures that cross-terms in the merged loss are eliminated, minimizing backward transfer (BWT) degradation (Tang et al., 16 Jan 2025, Khan et al., 28 Jul 2025).

For vector sets, OIM-type randomized pairwise orthogonalization (akin to Kaczmarz-type methods) converges almost surely to an orthonormal basis, increasing the $i=1,\ldots,N$ 0-dimensional volume $i=1,\ldots,N$ 1, and attaining $i=1,\ldots,N$ 2-volume accuracy in $i=1,\ldots,N$ 3 pairwise steps (Shah et al., 2024).

In manifold settings, OIM methods operate on the canonical Riemannian metric of $i=1,\ldots,N$ 4, with merging via geodesic averaging in the Lie algebra. The merged update remains on the orthogonal group, ensuring preservation of weight geometry and hyperspherical energy, and mitigating norm collapse via explicit magnitude correction (Yang et al., 5 Feb 2026).

Under continual integration, the norm of the aggregate merged update remains bounded by the largest task drift, and memory complexity is $i=1,\ldots,N$ 5 per merge, independent of the number of tasks (Tang et al., 16 Jan 2025).

4. Computational and Practical Considerations

OIM is designed for scalability and memory efficiency. Only the current base, merged parameters, and the incoming expert model are required at each step, with memory overhead independent of the total number of tasks (Tang et al., 16 Jan 2025, Khan et al., 28 Jul 2025). Projection and orthogonalization cost is $i=1,\ldots,N$ 6 per step (with $i=1,\ldots,N$ 7 model dimension and $i=1,\ldots,N$ 8 number of tasks), or $i=1,\ldots,N$ 9 per-layer for SVD operations. Large-scale variants exploit sparse representations or blockwise processing.

Parallelization is straightforward: pairwise or blockwise orthogonalizations can be performed on disjoint parameter subsets (Shah et al., 2024), and projection steps can be implemented efficiently for high-dimensional and sparse deltas. Gradient-based merging phase admits standard optimizers, early stopping, and validation-based model selection (Khan et al., 28 Jul 2025).

Elastic Weight Consolidation (EWC) and synthetic replay mechanisms can be layered atop OIM's merge objective, further stabilizing model parameters and preventing drift away from $\Delta_i = \theta_i - \theta_0$ 0 during repeated merges (Khan et al., 28 Jul 2025).

5. Empirical Performance and Comparative Findings

OIM has been extensively validated on vision (CIFAR-100, ImageNet-100), language (AG News, Yahoo, DBpedia), and multimodal (CLIP-ViT) benchmarks. Across these domains, OIM achieves:

Higher average accuracy: E.g., OIM achieves 78.4% average accuracy on continual CIFAR-100 compared to 72.1% for prior TIES-Merging (Khan et al., 28 Jul 2025).
Reduced catastrophic forgetting: Positive BWT (+0.12), indicating prior task performance is preserved after new merges (Khan et al., 28 Jul 2025).
Near-constant memory and rapid composition: Recovery and unmerge times are consistently lower than retraining-based alternatives.
Interference minimization and reversibility: Empirical results demonstrate that OIM’s orthogonal projections enable unmerging of tasks with minimal accuracy drop (1.8% vs. 8–15% for baselines) (Khan et al., 28 Jul 2025).

In the LLM setting, Orthogonal Model Merging (OrthoMerge, leveraging the OIM paradigm) has demonstrated superior task synergy and less forgetting compared to Euclidean model arithmetic and norm-uncorrected manifold averaging. Merging five OFT-trained Llama-3.1-8B experts, OIM achieved 46.25% average accuracy over 41.46% for task arithmetic and 44.71% for expert averaging. The advantage grows with increased task count (Yang et al., 5 Feb 2026).

Ablations show that omitting orthogonal projections, magnitude correction, or stability penalties results in substantial accuracy degradation.

6. Extensions, Applications, and Limitations

OIM's modular design enables extensions to:

Model compliance and reversibility: OIM enables not only continual integration but also the ability to "unmerge" selected task adaptations efficiently, crucial for compliance with privacy and data retention requirements (Khan et al., 28 Jul 2025).
Hybrid merge strategies: For models not natively OFT-trained, OIM with Orthogonal–Residual Decoupling solves an orthogonal Procrustes problem to extract group elements, merging on the manifold and in Euclidean space for residuals (Yang et al., 5 Feb 2026).
Streaming and blockwise orthogonalization: Streaming data or large parameter sets can be handled by online or blockwise variants (Shah et al., 2024).

Limitations include increased per-step computation over naive interpolation methods due to orthogonalization and optimization steps, storage of orthogonal deltas for unmerge capability, and possible sensitivity to parameter sparsity in extremely high-dimensional settings.

7. Relationship to Classical and Contemporary Orthogonalization Methods

OIM is conceptually related to classical Gram–Schmidt and Householder QR, but offers distinct advantages in randomized, parallel, and modular settings. Pairwise randomized (Kaczmarz-inspired) OIM is simple, parallelizable, and monotone in $\Delta_i = \theta_i - \theta_0$ 1-volume, though Householder QR remains the worst-case optimal for numerical precision. OIM's innovation is in extending these principles to model parameter space for scalable, interference-minimizing continual merging, offering a principled and robust foundation for modular AI systems (Shah et al., 2024, Tang et al., 16 Jan 2025, Khan et al., 28 Jul 2025).

Markdown Report Issue Upgrade to Chat

References (4)

Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging (2025)

Modular Delta Merging with Orthogonal Constraints: A Scalable Framework for Continual and Reversible Model Composition (2025)

Orthogonal Model Merging (2026)

A Kaczmarz-Inspired Method for Orthogonalization (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Orthogonal Iterative Merging (OIM).