Cross-architectural universal subspaces and architecture design

Determine the differences between the architecture-specific, layer-wise universal subspaces extracted from the weight matrices of distinct deep neural network architectures, and ascertain whether neural architectures can be explicitly designed to optimize the geometry of these universal subspaces.

Background

The paper advances the Universal Subspace Hypothesis, providing large-scale empirical evidence that many independently trained models across modalities and tasks converge to low-rank, architecture-specific, layer-wise subspaces in parameter space. This is demonstrated via spectral analysis on over a thousand models, including LoRA adapters for Mistral-7B and SDXL, full Vision Transformers, LLaMA-8B, GPT-2, and Flan-T5.

While the work focuses on models sharing the same architecture (where joint subspaces can be robustly extracted), the authors explicitly highlight that cross-architectural comparisons remain unresolved. They note the absence of current methods to compare subspaces across different architectures and raise the need to understand how these universal subspaces differ between architectures and whether architectures can be intentionally designed to optimize subspace geometry.

References

We leave open the question of cross-architectural comparison: how do the universal subspaces of distinct architectures differ, and can we explicitly design architectures to optimize the geometry of this subspace?

The Universal Weight Subspace Hypothesis (2512.05117 - Kaushik et al., 4 Dec 2025) in Introduction (Section 1)