Homogeneous-Heterogeneous Model Merging
- Homogeneous-Heterogeneous Model Merging is the process of integrating models with identical or diverse architectures to consolidate specialized capabilities without additional retraining.
- Techniques include direct parameter averaging for homogeneous models and advanced alignment, projection, or adapter-based methods for heterogeneous models.
- Applications span large language models, multimodal systems, and federated learning, where merging enhances efficiency and enables unified performance in resource-limited environments.
Homogeneous-Heterogeneous Model Merging
Model merging is the process of consolidating multiple expert models—often fine-tuned independently on different data or tasks—into a single model, ideally without additional retraining or labels. This approach enables the aggregation of specialized capabilities, efficient model reuse, and deployment of multifaceted models in resource-limited environments. The practical and theoretical challenges of model merging are sharply differentiated depending on whether the constituent models are homogeneous (identical architectures and parameterizations) or heterogeneous (structurally or functionally diverse). This article provides an in-depth, technically complete overview of both regimes, the challenges each presents, methodologies to address them, and the state-of-the-art empirical and theoretical results.
1. Conceptual Distinction: Homogeneous vs. Heterogeneous Merging
Homogeneous model merging is defined as the operation of combining models that share precisely the same neural architecture, including layer organization, parameter shapes, module types, and in most cases even the tokenizer and embedding vocabulary (Yang et al., 2024). In this regime, all model parameters are directly aligned, permitting element-wise arithmetic (such as averaging or interpolation), and extending to more sophisticated parameter-space manipulations reliant on strict isomorphism.
Heterogeneous model merging refers to consolidating models exhibiting differences either at the architectural level (e.g., varying numbers of layers, hidden or attention dimensions, modality branches, or functional modules) or at the semantic (label space, induction bias) or representational level (Xu et al., 2024, Chen et al., 2 Apr 2026, Bhattacharya et al., 22 Feb 2026, Du et al., 31 Mar 2025, Zhang et al., 27 Mar 2025). This requires resolving non-isomorphic parameter spaces, handling basis misalignment, and mitigating functional mismatches that preclude naive parameter-level arithmetic.
The distinction is critical because strategies that are provably effective for homogeneous merges—such as uniform weight averaging or delta-based task arithmetic—become ill-posed and empirically unstable when directly applied to heterogeneous models. In contrast, heterogeneous merging requires additional alignment, projection, or transformation steps, often involving architecture-aware mapping, surrogate functional representations, or hybrid objective formulations.
2. Taxonomy of Model Merging Methodologies
The universe of model merging methods is exhaustively categorized by the operational domain (homogeneous/heterogeneous), parameter alignment strategy, objective type, and robustness to model divergence (Yang et al., 2024). The canonical taxonomy is summarized as follows:
| Regime | Merge Class | Representative Techniques |
|---|---|---|
| Homogeneous | Parameter-space | Linear interpolation, task arithmetic, Fisher-weighted averaging (Yang et al., 2024, Wang et al., 5 Mar 2026), TIES/DARE (Zhou et al., 3 Feb 2025), SVD-subspace merges, SLERP/Karcher mean (Wang et al., 5 Mar 2026) |
| Homogeneous | Subspace/routing | Mixture-of-Experts (MoE) merging, perplexity or gradient-based routing (Zhou et al., 3 Feb 2025), dynamic composition (twin-merging) (Lu et al., 2024) |
| Homogeneous | Post-calibration | Representation-matching, last-layer adjustment, layerwise reinitialization |
| Heterogeneous | Transform-then-merge | Knowledge distillation, adapter-based fusion (Chen et al., 2 Apr 2026), cross-architecture mapping (Xu et al., 2024, Zhang et al., 27 Mar 2025), function-space merges (Bhattacharya et al., 22 Feb 2026) |
| Heterogeneous | Output/label fusion | Zero-padding heads for label alignment (Hackmann, 2024), decision-space algebra for classifier merging (Giabbanelli et al., 2015) |
| Heterogeneous | Specialized search | Blockwise search, segment-based alignment (AutoMerge, SMA/LMA) (Lu et al., 30 Jan 2026, Xu et al., 2024) |
This organization is operationalized via two broad phases: (1) architectural and parametric alignment (pre-merge transformation), and (2) function- or parameter-space fusion (averaging, interpolation, projection, or optimization). The taxonomy guides method selection and expected performance.
3. Principled Algorithms for Homogeneous and Heterogeneous Merging
3.1. Homogeneous Model Merging Algorithms
Linear parameter averaging, task arithmetic , and Fisher-weighted averaging are extensively validated for homogeneously-structured models (Yang et al., 2024, Wang et al., 5 Mar 2026). Subspace-aware algorithms such as TIES (TrIm & Elected Sign) and DARE (Drop And REscale) enforce sparsity and sign-consistency to mitigate destructive interference (Zhou et al., 3 Feb 2025). The Fisher–Rao Karcher mean generalizes linear interpolation by interpreting merge as the Riemannian barycenter on the statistical manifold, robustly preserving functionality under moderate heterogeneity (Wang et al., 5 Mar 2026). Twin-Merging frameworks employ modularization and input-conditional composition, dynamically integrating shared and exclusive knowledge to adapt to diverse test distributions (Lu et al., 2024).
3.2. Heterogeneous Model Merging Algorithms
Architectural Alignment and Projection
Layer alignment techniques such as Layer-wise Model Alignment (LMA) and Segment-wise Model Alignment (SMA) aggregate or segment layers based on representation similarity (e.g., via CKA), harmonizing models with differing depths (Xu et al., 2024). Elastic neuron zipping projects weight matrices of disparate width onto a common latent dimensionality using bipartite matching of neuron activations, enabling width-heterogeneous merging.
Cross-Architecture and Operator-Level Fusion
When merging across fundamentally distinct architectures (e.g., GCN to GAT in GNNs), operator-space unification is key. The H-GRAMA framework expresses all layers as mixtures in a universal message-passing basis, aligns via CKA and Procrustes transformation, and fuses via closed-form regression and per-layer confidence weighting, preserving expert functionality without retraining and supporting universal operator mixture extensions (Bhattacharya et al., 22 Feb 2026).
Function- and Adapter-Space Methods
HeteroFusion aligns modules at the level of functional units (e.g., LoRA adapters), employing SVD-guided denoising and cross-attention–driven fusion with update magnitude control to robustly transfer knowledge across Transformer backbones with topological mismatch (Chen et al., 2 Apr 2026). In multimodal or label-heterogeneous contexts, mapping functions and zero-padding reconcile parameter and label space asymmetries, with subsequent linear interpolation or more specialized strategies (e.g., AdapterMMS, HM³) for fusion (Du et al., 31 Mar 2025, Hackmann, 2024).
Search- and Validation-Driven Procedures
AutoMerge segments complex models into blocks according to architecture boundaries (e.g., CNN, Transformer, MLP), and employs Bayesian optimization (via a proxy validation score) to search for the best merging algorithm and hyperparameter configuration per block, outperforming any single, globally-applied method in multi-domain tasks (Lu et al., 30 Jan 2026).
Causal and Data-Driven Parameter Selection
Activated Parameter Locating leverages intervention-based importance assessment to determine which delta-parameters are functionally salient either in-domain or for unknown out-of-domain tasks (Kong et al., 2024). This prunes parameter updates according to their causal impact or gradient approximation, allowing heterogeneous merging guided by few-shot data.
4. Theoretical Foundations and Generalization Analysis
A unified generalization framework based on -stability elucidates why homogeneous merging (especially with uniform averaging and closely aligned fine-tuned experts) yields bounded excess risk, while heterogeneity in hyperparameters, data, or architecture inflates instability and empirical error (Li et al., 29 Jan 2026). Under assumptions of -smoothness, bounded gradient variance, and bounded heterogeneity , the excess-risk bound for merged models quantifies the stability and optimization trade-offs and how heterogeneity propagates through merging coefficients and parameter misalignment.
Theoretical results on geometric and information-theoretic grounds reinforce why functional (Fisher-Rao) and Karcher mean–based merges outperform linear interpolations in curved loss landscapes, especially as divergence among experts increases (Wang et al., 5 Mar 2026). Non-commutativity, non-associativity, and bias order in decision-space algebras for arbitrary classifier merging provide formal constraints and enable monotonic bias profile sculpting (Giabbanelli et al., 2015).
5. Empirical Insights, Performance, and Practical Recommendations
| Scenario | Homogeneous Methods | Heterogeneous Methods |
|---|---|---|
| Same backbone/transforms | Linear avg., delta arithmetic, TIES/DARE | Not required; all methods reduce to classical forms |
| Layer/layout mismatch | Fails (representation collapse, rank loss) | Layer alignment via SMA/LMA, operator-space transport, feature/CKA-based mapping |
| Hidden size/neuron mismatch | Not supported | Neuron zipping, cross-architecture projection, universal adapter fusion |
| Label space heterogeneity | Not supported | Zero-padding, output head augmentation (HM³) |
| Architectures/modalities | Not supported | Functional/adapter-based fusion, operator mixture (H-GRAMA), mapping-and-merge |
| Data-efficient fusion | Direct averaging/tuning not needed | Unlabeled response consistency (AdaMMS), small replay sets (HeteroFusion) |
Homogeneous merging achieves near-oracle performance when models share initialization and minimal drift, with functional or Karcher mean merges outperforming linear interpolation as heterogeneity grows (Wang et al., 5 Mar 2026). In highly divergent settings (multi-modal, cross-family, layer-size-mismatched), blockwise search, operator-space fusion, or adapter-level alignment approaches are clearly superior (Lu et al., 30 Jan 2026, Bhattacharya et al., 22 Feb 2026, Xu et al., 2024, Chen et al., 2 Apr 2026).
For performance-sensitive or safety-critical applications (e.g., medical imaging under domain shifts), adaptive schemes leverage entropy or consistency proxies to reweight expert contributions batch-wise, outperforming static merges across distribution shifts (Ambekar et al., 24 Feb 2026).
Practical guidelines include: (1) segment according to architectural heterogeneity, (2) use validation-driven or entropy-based weighting when possible, (3) avoid excessive merging of unrelated or highly divergent experts, (4) supplement parameter-space fusion with structural or functional alignment, and (5) in the absence of labels, employ unsupervised consistency measures.
6. Applications Across Modalities and Deployment Scenarios
Model merging methodologies, both homogeneous and heterogeneous, now underpin state-of-the-art practice in:
- LLMs: Generalization across instruction-, safety-, reasoning-specialized LLMs via task vectors, Fisher-Karcher merges, and functional adapters (Wang et al., 5 Mar 2026, Chen et al., 2 Apr 2026, Zhang et al., 2024).
- Multimodal Models: Fuse vision-LLMs or domain-expert MLLMs using mapping, merging, and search paradigms without supervision (Du et al., 31 Mar 2025).
- Mixture-of-Experts Frameworks: Modular and parameter-efficient MoE assembly from diverse domain experts, incorporating fast routing and interference avoidance (Zhou et al., 3 Feb 2025).
- Graph Neural Networks: Operator-space fusion enabling training-free merging of GCN, GAT, SAGE, and GIN architectures (Bhattacharya et al., 22 Feb 2026).
- Classifier Ensembles and Guardrails: Training-free consolidation of classifiers with heterogeneous output spaces (e.g., guardrails for LLMs) by output head alignment (Hackmann, 2024).
- Continual, Federated, and Domain-Generalized Learning: Dynamic merging for lifelong adaptation, privacy-preserving learning, and distributionally robust deployment (Yang et al., 2024).
7. Limitations and Open Directions
Heterogeneous model merging remains a rapidly evolving field with persistent challenges:
- Merging across radically disparate architectures (e.g., vision transformers to graph networks) is only nascently addressed, requiring further operator-space and universal basis development (Bhattacharya et al., 22 Feb 2026).
- Theoretical conditions guaranteeing monotonic improvement or bounded degradation under arbitrary merging paths are incomplete, especially in deeply non-Euclidean, highly nonlinear landscapes (Wang et al., 5 Mar 2026, Li et al., 29 Jan 2026).
- Data-free heterogeneous merging, particularly with minimal alignment (no replay sets or validation data), is an open challenge (Chen et al., 2 Apr 2026).
- Cycle-consistency, poisoning resilience, and intellectual property protection under iterative or continual merging are active areas of research (Yang et al., 2024).
- Extensibility to multi-modal, cross-label, and layered output space scenarios (e.g., fusion of object detectors with captioning models) is in early development.
Future work is focused on generalizing functional and operator-space merges, automated architecture inference for alignment, merging under federated and privacy-constrained settings, and more expressive weighting and adaptation strategies at both parameter- and function-space levels.
References (arXiv IDs):
(Wang et al., 5 Mar 2026) Functionality-Oriented LLM Merging on the Fisher--Rao Manifold (Chen et al., 2 Apr 2026) Can Heterogeneous LLMs Be Fused? (Du et al., 31 Mar 2025) AdaMMS: Model Merging for Heterogeneous Multimodal LLMs (Zhou et al., 3 Feb 2025) MergeME: Model Merging Techniques for Homogeneous and Heterogeneous MoEs (Yang et al., 2024) Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities (Giabbanelli et al., 2015) An Algebra to Merge Heterogeneous Classifiers (Xu et al., 2024) Training-free Heterogeneous Model Merging (Lu et al., 30 Jan 2026) AutoMerge: Search-Based Model Merging Framework for Effective Model Reuse (Bhattacharya et al., 22 Feb 2026) Training-Free Cross-Architecture Merging for Graph Neural Networks (Hackmann, 2024) HM3: Heterogeneous Multi-Class Model Merging (Zhang et al., 27 Mar 2025) Model Assembly Learning with Heterogeneous Layer Weight Merging (Lu et al., 2024) Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging (Zhang et al., 2024) Unconstrained Model Merging for Enhanced LLM Reasoning (Ambekar et al., 24 Feb 2026) The Mean is the Mirage: Entropy-Adaptive Model Merging under Heterogeneous Domain Shifts in Medical Imaging (Li et al., 29 Jan 2026) Understanding Model Merging: A Unified Generalization Framework for Heterogeneous Experts (Kong et al., 2024) Activated Parameter Locating via Causal Intervention for Model Merging