Cross-Model Stability Transfer
- Cross-model stability transfer is a framework for measuring and aligning learned representations across distinct neural architectures using linear mappings and shaping operators.
- It employs dual-pathway decomposition to separate structured inductive biases from residual adaptations, ensuring robust semantic invariance.
- Empirical measurements and theoretical bounds validate its effectiveness in enabling model distillation, modularity, and parameter-efficient adaptation.
Cross-model stability transfer refers to the principled paper, measurement, and facilitation of stability and semantic alignment of learned internal representations across different neural architectures or model instantiations. This field addresses the key question of how, why, and under what structural conditions representations, feature spaces, or task-specific adapters can not only generalize within a model but also transfer—often under simple mappings—across models with different priors, parameterizations, or inductive biases (Nikooroo et al., 5 Aug 2025).
1. Conceptual Foundations: Representational Stability and Alignment
Representational stability denotes the sensitivity of a model's learned features to changes in architectural or procedural priors. If models and are trained on the same task, with respective mappings , then stability asks that there exists a transformation such that is small for typical .
Alignment quantifies the possibility of finding a (frequently linear) transform with . Standard alignment metrics include Centered Kernel Alignment (CKA), principal-angle subspace overlaps, Procrustes error, and cross-model probe accuracy (Nikooroo et al., 5 Aug 2025).
This general framework is instantiated in diverse research lines:
- Cross-architecture feature geometry (Nikooroo et al., 5 Aug 2025)
- Portability of concept steering vectors in LLMs (Huang et al., 2 Jan 2025)
- Adapter transfer and subspace-based parameterization (Farhadzadeh et al., 27 Jan 2025)
- Transferring unstable features for robust classifier training (Bao et al., 2021)
2. Structural Decomposition: Shaping Operators and Corrective Paths
One robust class of models for cross-model stability transfer is the dual-pathway decomposition: where (shaping operator) encodes inductive bias—block-sparse, low-rank, or spectral structure—while is a residual corrective path (Nikooroo et al., 5 Aug 2025). This decomposition separates the model into an explicitly structured pathway reflecting architectural priors and a learned component which adapts to the residual semantic content required by the task.
Empirical and theoretical analyses show that closeness of and controlled drift in directly imply stable, transferable representations across models (Nikooroo et al., 5 Aug 2025). Analogous structure appears in LoRA-X, which explicitly constrains low-rank adapters to operate within the dominant singular subspace of the base model's weights, enabling projection-based, data-free cross-model transfer (Farhadzadeh et al., 27 Jan 2025).
3. Theoretical Results: Alignment Bounds and Mapping Conditions
Provable guarantees characterize when and how cross-model stability transfer succeeds. Let , , and . Then, the linear mapping yields: for all , and, integrated over the data distribution ,
(Nikooroo et al., 5 Aug 2025). This formalizes the intuition that stability transfer is possible whenever model structures (e.g., shaping operators) remain close in operator norm and residual paths are well-controlled.
Analogous stability is observed in LoRA-X, where the projection of the source adapter into the target model's subspace ensures small Frobenius-norm error provided subspace alignment metrics :
(Farhadzadeh et al., 27 Jan 2025).
4. Empirical Measurement and Protocols
Empirical assessment of cross-model stability transfer relies on suites of geometric and functional metrics computed over learned representations on fixed datasets.
Key geometric metrics:
- Linear CKA between embedding matrices
- Principal-angle (subspace) overlap for top- singular vectors
- Procrustes alignment error:
- Numerical similarity of learned linear transformations across concept steering tasks (Nikooroo et al., 5 Aug 2025, Huang et al., 2 Jan 2025)
Key protocols:
- Train several models with different architectures or initializations on identical tasks/datasets.
- Compute penultimate-layer or relevant embeddings from a validation set.
- Measure alignment metrics and cross-model probe accuracy (e.g., logistic regression trained on one model's representations evaluated on another's).
- In testbed cases (e.g., LoRA-X, LLM SV transfer), evaluate transferability of adapters or steering vectors via end-task metrics (e.g., human preference score, generated image quality, LLM output alignment) (Farhadzadeh et al., 27 Jan 2025, Huang et al., 2 Jan 2025).
Table: Summary of Representative Cross-Model Alignment Results (Nikooroo et al., 5 Aug 2025)
| Model Pair | CKA (mean ± std) | Procrustes Error | Transfer Accuracy (%) |
|---|---|---|---|
| PGNN – MLP | 0.78 ± 0.02 | 0.12 ± 0.01 | 88.1 ± 1.1 |
| PGNN – CNN | 0.62 ± 0.03 | 0.26 ± 0.02 | — |
| MLP – CNN | 0.55 ± 0.04 | — | 75.6 ± 2.0 |
| PGNN_NoStruct–MLP | 0.69 ± 0.03 | — | 82.0 ± 1.4 |
These results indicate that explicit structural shaping () in the architecture both increases metric alignment and functional interoperability (Nikooroo et al., 5 Aug 2025).
5. Methodological Extensions: Adapter, Steering Vector, and Feature-space Transfer
- LoRA-X parameterizes and transfers low-rank adapters as changes in the dominant SVD subspace of pretrained layer weights, yielding stable and data-free adapter transfer whenever the subspace similarity between source and target layers exceeds a threshold (Farhadzadeh et al., 27 Jan 2025).
- Concept Steering Vectors (SVs) in LLMs can be ported across different architectures using a learned ordinary-least-squares linear mapping, which is found empirically to generalize across concepts and model families (SSIM ≈ 0.87–0.95 among alignment transformations). This permits robust behavioral control transfer, e.g., modulating "harmfulness" or "sycophancy" by porting direction vectors across models without gradient access (Huang et al., 2 Jan 2025).
- TOFU (Transfer of Unstable Features) extracts "nuisance" (unstable) features from source tasks and uses them to partition target datasets, enforcing robustness by group-DRO minimax training over those clusters (Bao et al., 2021). Unlike feature reuse, this pipeline transfers the notion of instability/spuriousness rather than the absolute feature weights.
6. Applications and Implications: Distillation, Modular Pipelines, and Robustness
Cross-model stability transfer supports a variety of system-level capabilities:
- Model distillation: Teacher-student alignment is facilitated if both share the same shaping operator, concentrating distillation loss on residuals (Nikooroo et al., 5 Aug 2025).
- Plug-and-play modularity: When modules are structured around a common shaping prior, they can be swapped between models—encoder-decoder or pipeline settings—with predictable bounds on transfer error (Nikooroo et al., 5 Aug 2025).
- Parameter-efficient adaptation: LoRA-X demonstrates that adapters can be transferred without retraining if SVD subspaces align, enabling scalable fine-tuning across model updates and derivative architectures (Farhadzadeh et al., 27 Jan 2025).
- Stable semantic interface: Steering vectors align across LLMs via a single mapping. Safety and control functions (e.g., refusing unsafe queries) can thus be ported among closed or proprietary models (Huang et al., 2 Jan 2025).
- Robustness to spurious correlations: By transferring only the unstable-feature partition, models can enforce invariance to known nuisance factors without propagating spurious dependencies (Bao et al., 2021).
Design recommendations: Explicitly separate shaping (inductive bias) from residual adaptation in layers, regularize shaping operators during model updates, ensure their spectra remain well-conditioned, and align on stable, low-frequency semantic subspaces where possible (Nikooroo et al., 5 Aug 2025).
7. Limitations and Future Directions
Current methods succeed predominantly when the architectures share strong structural similarities or task-induced priors (e.g., singular subspaces, orthogonal latent semantics) (Farhadzadeh et al., 27 Jan 2025, Huang et al., 2 Jan 2025, Nikooroo et al., 5 Aug 2025). Deterioration is observed for highly divergent architectures or when weak prior alignment translates to low subspace or representation similarity.
Open directions include:
- Extending linear mapping to non-linear or kernel alignment for more distant models
- Automatic modulation parameter selection in concept steering
- Standardized cross-model benchmarks encompassing broad concept and task taxonomies
- Theoretical delineation of the regimes in which linear/projection-based transfer breaks down.
The field of cross-model stability transfer thus provides both conceptual and algorithmic tools for building increasingly interoperable, robust, and modular machine learning systems, with direct implications for foundation model scaling, open safety protocols, and long-term reliability (Nikooroo et al., 5 Aug 2025, Huang et al., 2 Jan 2025, Farhadzadeh et al., 27 Jan 2025, Bao et al., 2021).
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free