Cross-Layer Merging: Methods & Impact

Updated 25 February 2026

Cross-layer merging is a collection of adaptive methods that fuse heterogeneous layers—such as neural network components and protocol stacks—to enhance efficiency and robustness.
It employs layer-wise scaling, coefficient learning, and auto-regressive optimization to address inter-layer dependencies and mitigate covariate shifts.
Applications span from consolidating neural models to optimizing wireless protocols, yielding measurable improvements in accuracy, throughput, and overall system QoE.

Cross-layer merging is a collective term for methodologies that perform adaptive, structure-aware integration across system layers—be they neural network layers, software protocol stacks, or network architecture levels—with the goal of optimally fusing information, functionality, or parameters to enhance system efficiency, robustness, or performance. The defining property is the explicit coordination or optimization that spans (and leverages the interactions between) heterogeneous layers rather than handling each in isolation or uniformly. Applications range from state-of-the-art neural model merging and operator-space GNN fusion, to real-time wireless cross-layer protocol design and resource scheduling.

1. Foundational Principles and Definitions

Classical approaches to model or system fusion often operate at a single layer (e.g., parameter averaging in neural networks, or MAC-only optimizations in networks) and treat components as either fully interchangeable or strictly isolated. Cross-layer merging, in contrast, is characterized by:

Recognition of strong heterogeneity across layers: Early layers may encode task-agnostic or highly redundant information, while deeper layers capture stable, task-specific, or domain-specialized representations (Wang et al., 10 Feb 2026, Alcover-Couso et al., 2024, Yao et al., 20 May 2025).
Adaptation of merging rules, coefficients, or schedules along the layer-wise structure of the system (e.g., scaling, weighting, or projecting deltas per layer, rather than globally).
Explicit modeling of inter-layer or cross-component dependencies, such as covariate shift effects, interface constraints, or operator-mismatch in the case of heterogeneous architectures (Buzzega et al., 29 Aug 2025, Bhattacharya et al., 22 Feb 2026).
In protocol or network stacks, cross-layer merging involves joint adaptation, assignment, or optimization where information or decisions from multiple protocol layers (e.g., APP, MAC, PHY) are marshaled into a unified control policy (Liu et al., 2024, 0712.2497, 0905.4087).

The typical aim is to combine the desirable properties of conventionally separated layers—such as generalization ability in early layers and specialization in later layers—or to reconcile potentially conflicting objectives prevalent at each layer (e.g., rate-distortion versus transmission cost in wireless video streaming (0905.4087)).

2. Representative Methodologies in Neural Architectures

Layer-wise Scaling and Proxy-based Scheduling

Several methods use deterministic or data-driven per-layer rescaling to suppress deleterious interference in fragile (shallow) layers and amplify stable contributions in deeper layers. For example, LARV ("Layer-wise Adaptive Rescaling Veneer") introduces a data-free, deterministic schedule of per-layer scales s_ℓ, derived from closed-form, weight-only diagnostics such as effective-rank contrast and commutator conflict between the base and update tensors; these are mapped via three-tier or tanh gates to control the strength of the merge per-layer (Wang et al., 10 Feb 2026). This leads to significant accuracy gains and robustness to input corruptions.

This approach is structurally mirrored in domain adaptation for segmentation, where early backbone layers are averaged to "smooth out noise and combine features," but task-specific heads are preserved from a strong anchor model (Alcover-Couso et al., 2024). Importance scores, statistical proxies, or functional measures (e.g., mutual information between activations in ACM (Yao et al., 20 May 2025)) can be used to adapt weights layer-by-layer.

Data-driven, Layer-wise Coefficient Learning

Other methods go beyond heuristics, learning per-layer (and even sub-tensor or "chunk-wise") coefficients by explicit alignment of activations and/or logits. Expert Merging learns α_{l,e} for each expert e and layer l via unsupervised alignment losses; "Expert Merging++" refines this further with importance-guided chunking, splitting high-importance layers into multiple independently merged chunks (Zhang et al., 30 Sep 2025). LOT Merging directly minimizes the discrepancy in feature representations ("feature drift") at each layer, leading to convex (often closed-form) solutions for each fusion subproblem (Sun et al., 29 May 2025).

Chain of Merges takes the further step of explicit auto-regressive optimization: at each layer, it refits the merge objective to the actual (post-merge) input distribution caused by all prior merged layers, mitigating the covariate shift intrinsic to "layerwise-independent" strategies (Buzzega et al., 29 Aug 2025).

Heterogeneous and Cross-Architecture Merging

Traditional parameter-space merging is not suitable for models with disparate architectures (e.g., GCN and GAT for GNNs). H-GRAMA introduces operator-space merging: all parent layer operations are decomposed into a shared basis (Universal Message Passing Mixture), alignment is achieved via CKA and orthogonal Procrustes maps, and final parameter fusion proceeds convexly and is further corrected via label-free message-statistics matching (Bhattacharya et al., 22 Feb 2026). Model Assembly Learning extends cross-layer merging to arbitrary layer pairs across architectures, elegantly handling width mismatches by permutation/projection and zero-padding, and formulating practical strategies for selective integration (Zhang et al., 27 Mar 2025).

3. Cross-Layer Merging in Communication and Network Protocols

In adaptive wireless systems, cross-layer merging refers to the explicit coupling, co-optimization, or feedback-driven coordination of separate protocol layers.

In "Structural Solutions for Cross-Layer Optimization," Fu & van der Schaar cast the joint adaptation of application (video packet scheduling), MAC (retransmission), and PHY (modulation/power) parameters into a single finite-horizon MDP, merging the metrics and control variables across the protocol stack into unified Bellman recursions. The optimal policy exploits a prioritized DAG over packet dependencies and achieves substantial gains in both video quality and efficiency (0905.4087).
StreamOptix decomposes the video delivery problem into Application, MAC, and PHY subproblems with explicit closed-loop feedback (e.g., soft-ACKs, block error measurements, link capacity predictions), operationalizing a full cross-layer control loop, and documents measurable improvements in SSIM/PSNR/QoE over isolated-layer approaches (Liu et al., 2024).
In protocol optimization, rigorous formalisms (e.g., layered MDPs with message exchange (0712.2497)) retain modularity by exchanging compact performance summaries as "messages" while achieving global optimality.

4. Theoretical Guarantees and Practical Schedules

A recurring thread is that optimal cross-layer merging must mediate between preserving the specificity or specialization required at deep layers or higher protocol levels (e.g., semantic segmentation heads, distributive policy vectors), and aggregating or smoothing information in the more general, shallow domains (e.g., vision backbones, low-level communication primitives).

Table: Structural Taxonomy of Cross-Layer Merging (select examples)

Approach/Domain	Cross-Layer Mechanism	Notable Theoretical/Empirical Benefit
LARV (Wang et al., 10 Feb 2026)	Data-free, per-layer scaling via matrix proxy	Improves robustness, up to +3.1 pp FusionBench accuracy
CoM (Buzzega et al., 29 Aug 2025)	Auto-regressive moment-matching per layer	Mitigates covariate shift; 91.7% ViT-B/32 accuracy
Expert Merging++ (Zhang et al., 30 Sep 2025)	Layer/chunk-wise α learned via alignment loss	Surpasses supervised Mixture Training
StreamOptix (Liu et al., 2024)	Closed-loop APP/MAC/PHY feedback, adaptive RBs	up to +75% QoE over uncoupled MPC-ABR
H-GRAMA (Bhattacharya et al., 22 Feb 2026)	Operator-space fusion, cross-arch alignment	1.2x–1.9x inference speedup, >90% specialist retention
Model Assembly (Zhang et al., 27 Mar 2025)	Selective cross-arch, layer-wise permutation	4–5× reduction in loss barrier for LMC
Layer-wise Model Merging/UDA(Alcover-Couso et al., 2024)	Backbone averaging + head anchoring	+4.2–6.8% mIoU in semantic segmentation

This layered mediation is formalized in the negative-transfer bound for LOT Merging, which relates the total loss in accuracy to the sum of layerwise feature drifts, and in the optimal Bellman decomposition and message-passing layers in protocol MDPs (Sun et al., 29 May 2025, 0712.2497).

5. Applications and Impact across Domains

Neural Models

Cross-layer merging is foundational in efficient consolidation of specialist or task-adapted models (e.g., SFT experts for vision/language, domain-adapted segmenters, expert GNNs) into a single inference model, providing:

Data-free or weakly supervised fusion, with robust layer-adaptivity and minimal performance sacrifice (Wang et al., 10 Feb 2026, Alcover-Couso et al., 2024).
Capability for heterogeneous and open-ended model assembly, unifying architectures that only partially overlap (Zhang et al., 27 Mar 2025, Bhattacharya et al., 22 Feb 2026).
Improved sample efficiency and robustness, mitigating the typical trade-offs seen in ensembling, Fisher-weighted, or uniform averaging approaches (Yao et al., 20 May 2025, Sun et al., 29 May 2025, Lenz, 3 Dec 2025).

Communication and Networking

In protocol stacks and multi-path transmission:

Cross-layer merging that marshals queuing, service-time, and packet-loss rate information, as in the QueueAware MPTCP scheduler, yields fairer and higher-throughput flow scheduling (Shreedhar et al., 2017).
In ad hoc wireless networks, "merging" MAC-layer collision resolution with PHY-layer collaborative beamforming enables collective decoding and spatial packet separation, pushing network throughput beyond traditional limits (0704.2841).
Modular, layered-MDP cross-layer optimization realizes global optimality without violating encapsulation of open-standard protocol stacks (0712.2497).

6. Open Challenges, Limitations, and Future Directions

Current cross-layer merging techniques are effective in settings with aligned or structurally similar layers but face several challenges:

Heterogeneous merging requires operator-space formalization and careful alignment (e.g., GNNs with fundamentally different message propagation) (Bhattacharya et al., 22 Feb 2026).
Scalability of activation- or mutual-information-based techniques may be limited by dimensionality for very wide architectures (Yao et al., 20 May 2025, Buzzega et al., 29 Aug 2025).
Most methods focus on merging within a fixed architecture set; generalizing to Mixture-of-Experts, MoE-LMs, and unsupervised chunking remains an active area.
In communication protocols, constructing fully decentralized or online cross-layer optimization (with minimal message complexity) is an ongoing research challenge (0712.2497, Liu et al., 2024).

Potential future directions include online or adaptive per-layer scaling, operator-space extensions for domain generalization, and fusion-aware training protocols that explicitly anticipate cross-layer merging in their loss design.

7. Comparative Analysis and Synthesis

Cross-layer merging differs fundamentally from uniform or global approaches by exploiting layerwise (or architecture-wise) structure and heterogeneity. Its value is empirically validated across vision, language, multimodal, and communication settings by consistent improvements in target metrics (e.g., +1–7 points on FusionBench, +2–4 dB SNR in wireless transmission, superior mIoU in segmentation). The theoretical underpinnings—auto-regressive uncertainty quantification, operator-space unification, and convex quadratic feature alignment—provide principled guarantees of stability and transferability.

Current evidence suggests that cross-layer (and, more generally, structure-aware) merging is essential for robust, efficient model and system fusion in both neural and networked contexts, and will likely be a key component of future scalable, open-ended learning and communication systems (Wang et al., 10 Feb 2026, Sun et al., 29 May 2025, Alcover-Couso et al., 2024, 0712.2497, 0905.4087, Buzzega et al., 29 Aug 2025, Bhattacharya et al., 22 Feb 2026).