Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 78 tok/s
Gemini 2.5 Pro 50 tok/s Pro
GPT-5 Medium 23 tok/s
GPT-5 High 33 tok/s Pro
GPT-4o 78 tok/s
GPT OSS 120B 467 tok/s Pro
Kimi K2 237 tok/s Pro
2000 character limit reached

Subspace-Boosted Model Merging (2506.16506v1)

Published 19 Jun 2025 in cs.LG, cs.AI, and cs.CV

Abstract: Model merging enables the combination of multiple specialized expert models into a single model capable of performing multiple tasks. However, the benefits of merging an increasing amount of specialized experts generally lead to diminishing returns and reduced overall performance gains. In this work, we offer an explanation and analysis from a task arithmetic perspective; revealing that as the merging process (across numerous existing merging methods) continues for more and more experts, the associated task vector space experiences rank collapse. To mitigate this issue, we introduce Subspace Boosting, which operates on the singular value decomposed task vector space and maintains task vector ranks. Subspace Boosting raises merging efficacy for up to 20 expert models by large margins of more than 10% when evaluated on vision benchmarks. Moreover, we propose employing Higher-Order Generalized Singular Value Decomposition to further quantify task similarity, offering a new interpretable perspective on model merging.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper introduces Subspace Boosting, an SVD-based method that preserves expressivity in merged task vectors and mitigates rank collapse.
  • The paper extends the approach with HO-GSVD to analyze expert overlap and enable optimal expert selection for reduced interference.
  • Empirical tests on ViT vision experts show a 10%+ accuracy gain with minimal overhead, confirming the method’s robustness and scalability.

Subspace-Boosted Model Merging: A Technical Overview

Model merging has become a critical technique for aggregating specialized expert models into a single, efficient model that inherits capabilities from multiple domains. However, as the number of experts increases, traditional merging strategies often experience diminishing returns and a drop in overall performance. The paper "Subspace-Boosted Model Merging" (2506.16506) addresses this degradation through comprehensive analysis and introduces an SVD-based approach that preserves the expressivity of merged task vectors, resulting in significantly improved multi-task performance.

Background: Task Arithmetic and Rank Collapse

Existing model merging methods frequently employ task arithmetic, representing the delta between finetuned expert weights θi\theta_i and a base model θbase\theta_{base} as task vectors Δi=θiθbase\Delta_i = \theta_i - \theta_{base}. The merged weight vector is often computed via linear interpolation:

1
theta_merged = theta_base + alpha * sum(task_vectors)
Prior work demonstrates that this merging strategy, while providing improvement over the base model, leads to a phenomenon termed "rank collapse". As more expert task vectors are summed, the resulting (merged) task vector occupies a lower-dimensional subspace, leading to redundancy and reduced generalization. Empirically, the stable rank of merged task vectors drops sharply as the number of experts increases, often falling by a factor of five or more, limiting the merged model’s ability to accommodate diverse tasks.

Subspace Boosting: Algorithmic Description

To directly address rank collapse, the authors propose Subspace Boosting—a simple, effective method that operates on the singular value decomposition (SVD) of the merged task vectors. For each linear or attention weight matrix in the merged model, Subspace Boosting:

  1. Performs SVD on the merged task vector matrix: A=UΣVTA = U \Sigma V^T.
  2. Computes cumulative explained variance and identifies a boosting threshold β\beta, defining how much of the spectrum to enhance.
  3. Clamps or boosts all singular values below the threshold to the threshold value, thus preserving more directions in the weight space.
  4. Reconstructs the weight matrix using the boosted singular values and original singular vectors.

Pseudocode for a single layer:

1
2
3
4
5
6
7
8
9
10
11
12
import numpy as np

def subspace_boosting(weight_matrix, beta=0.00):
    U, S, Vt = np.linalg.svd(weight_matrix, full_matrices=False)
    total = S.sum()
    cumsum = np.cumsum(S)
    idx = np.searchsorted(cumsum / total, beta)
    # Set all singular values after idx to the value at idx (or keep as-is if beta=0)
    S_boosted = S.copy()
    if idx < len(S):
        S_boosted[idx:] = S[idx]
    return U @ np.diag(S_boosted) @ Vt

Notably, the approach is agnostic to the merging rule (e.g., Task Arithmetic, TIES, Consensus), and only requires a single hyperparameter β\beta, which was empirically found to be robust to tuning.

Higher-Order Generalized SVD for Interpretable Merging

The paper extends Subspace Boosting with Higher-Order Generalized SVD (HO-GSVD), enabling decomposition of a set of task vectors {A1,A2,...,AN}\{A_1, A_2, ..., A_N\} into shared and unique components. HO-GSVD produces a common subspace for all experts, allowing direct analysis of overlap and difference among expert contributions. The resulting "Alignment Matrix" quantifies which expert pairs will exhibit less destructive interference upon merging and thus helps in expert subset selection for optimal merging.

Algorithmically, after HO-GSVD on the set of weight differentials, expert selection can be performed by analyzing pairwise alignment scores, prioritizing experts whose task vectors occupy orthogonal subspaces to minimize interference.

Empirical Results and Claims

  • Boosted rank: Application of Subspace Boosting maintains the stable rank of merged task vectors close to their initial (unmerged) levels, allowing more effective capacity sharing.
  • Substantial gains: For the challenging setting of merging 14 or 20 ViT-based vision experts, Subspace Boosting leads to gains of 10% or more in aggregate classification accuracy compared to the strongest unboosted baseline.
  • Robustness and generality: Subspace Boosting consistently outperforms or brings simple task arithmetic to parity with more sophisticated merging methods, and can be used in conjunction with other post-processing techniques such as LiNeS.
  • Interpretability: HO-GSVD not only enables expert selection but also provides actionable diagnostics about directions of subspace redundancy and interference, a property largely missing in previous model merging methodologies.

Table 1 from the paper illustrates these results across three merging baselines and three model sizes (ViT-B/32, ViT-B/16, ViT-L/14), consistently showing Subspace Boosting improving or matching state-of-the-art approaches, particularly as the number of merged experts increases.

Implementation Considerations and Limitations

  • Efficiency: Subspace Boosting introduces negligible computational overhead relative to standard merging since SVD is applied per-layer per-task vector, which is computationally tractable for modern vision/LLMs.
  • Scalability: As the number of experts increases, the authors observe that performance may eventually degrade due to irreducible overlaps between tasks. HO-GSVD can be leveraged to select optimal expert subsets, but further research is needed to enable scaling to very large ensembles.
  • Hyperparameter tuning: While β\beta is robust, optimal merging coefficients still require modest validation—a task which could be automated in future work.
  • Applicability: The method is directly compatible with any merging algorithm that operates in the space of task vectors, and is particularly well-suited to vision transformers and similar architectures with linear components.

Theoretical and Practical Implications

This work provides a quantitative explanation for the observed performance plateau in multi-expert model merging: rank collapse induced by cumulative aggregation of correlated task vectors. By applying subspace boosting, it is possible to maintain high-rank task vector spaces, thereby improving the representational capacity and generalization of merged models. The general approach draws a connection between linear algebraic properties of the weight space and practical multi-task learning objectives.

The paper’s insights open several future directions:

  • Automated merging coefficient selection via analysis of singular value spectra and alignment matrices.
  • Dynamic expert selection in large model pools based on subspace analysis, augmenting current static selection heuristics.
  • Extension to non-linear or adapter-based model merging by investigating subspace preservation in more complex architectures.

Concluding Remarks

"Subspace-Boosted Model Merging" provides both an empirical and theoretical advance in the area of model merging, demonstrating that SVD-guided rank preservation is essential for scaling merged models to large numbers of experts. The approach is simple to implement, robust across architectures and merging baselines, and introduces meaningful pathways for interpretability and expert selection using HO-GSVD. This work is likely to influence future research on collaborative, modular, and scalable AI systems that efficiently combine the strengths of multiple specialized models.