ScaleGMNs: Symmetry-Aware Graph Meta-Networks

Updated 23 November 2025

ScaleGMNs are graph neural network architectures that process entire networks while enforcing permutation and scaling equivariance to preserve inherent symmetries.
They enable single-shot, fully-amortized optimization, achieving high accuracy and sparsity with significantly reduced tuning time compared to traditional methods.
Their design extends to multi-scale analysis through spectral renormalization and supports applications like neural meta-optimization and digital twin modeling.

Scale Graph Metanetworks (ScaleGMNs) are a class of graph neural network–driven architectures designed to process entire neural networks as input objects, learning higher-order functions that act directly on the parameters and topologies of other neural networks. The essential innovation in ScaleGMNs lies in enforcing both permutation and scaling equivariance, fully respecting the symmetry group that leaves neural network input–output functions invariant. This symmetry-aware design underpins a wide range of applications, from fully-amortized single-shot optimization and weight-space canonicalization to meta-graph renormalization of complex networks and scalable network modeling. The following sections present a technical overview, construction, expressivity, practical algorithms, and empirical results demonstrating the significance of ScaleGMNs.

1. Symmetry Structure in Neural Networks

Neural network parameter spaces are characterized by large symmetry groups, primarily arising from two sources: neuron permutation within layers and scaling transformations applied to neurons and weights. For a feed-forward network with activations such as ReLU, $\tanh$ , or sine, the symmetry group $G$ comprises layerwise permutations $P_\ell \in S_{d_\ell}$ and diagonal scalings $Q_\ell = \mathrm{diag}(q_{\ell,1},\ldots,q_{\ell,d_\ell})$ : $T_\ell = Q_\ell P_\ell, \quad G = \prod_{\ell=1}^L \{T_\ell\}$ Acting on weights and biases as

$W'_\ell = Q_\ell P_\ell W_\ell P_{\ell-1}^\top Q_{\ell-1}^{-1}, \quad b'_\ell = Q_\ell P_\ell b_\ell$

leaves the realized network function unchanged for most commonly used activation functions, where scaling sets depend on $\sigma$ ( $\mathbb R_{>0}$ for ReLU, $\{\pm1\}$ for $\tanh$ / $\sin$ ) (Kalogeropoulos et al., 15 Jun 2024, Boufalis et al., 16 Nov 2025, Kuipers et al., 9 Oct 2025).

2. Architecture and Equivariance Constraints in ScaleGMNs

ScaleGMNs represent a target neural network as a directed graph $G=(\mathcal V,\mathcal E)$ , encoding neurons as nodes and weights as edge features. Every message-passing component is engineered to respect both neuron permutations and valid scaling transformations:

Neuron embeddings: $h_V(i)$ transform via $h'_V(i) = q_\ell(i)\, h_V(\pi_\ell(i))$ ,
Edge embeddings: $h_E(i,j)$ satisfy $h'_E(i,j) = q_\ell(i)\, h_E(\pi_\ell(i), \pi_{\ell-1}(j))\, q_{\ell-1}(j)^{-1}$ .

At each propagation step, specialized modules guarantee equivariance:

ScaleInv: Canonicalizes over scaling symmetry, producing scale-invariant statistics,
ScaleEq: Produces outputs that are equivariant under scaling transformations,
ReScaleEq: Ensures that composite message-passing steps respect heterogeneous scaling group actions for center, neighbor, and connecting edge (Kalogeropoulos et al., 15 Jun 2024, Kuipers et al., 9 Oct 2025, Boufalis et al., 16 Nov 2025).

Typically, the architecture consists of INIT, MSG, UPD, and READOUT modules, each rendering equivariance at their stage. Readouts are further symmetrized to ensure invariance if required.

3. Single-Shot Fully-Amortized Optimization

ScaleGMNs serve as fully-amortized optimizers mapping input networks $(G, \theta)$ directly to optimized parameters $\theta'$ with a single forward pass: $\hat f_\phi(G,\theta) \approx \arg\min_{\theta''} \mathcal C(G, \theta'', \mathcal D)$ where $\mathcal C$ might be, e.g., cross-entropy and $L_1$ penalty. This approach bypasses iterative SGD/Adam by amortizing weight updates over distributions of architecture–parameter pairs during meta-training. ScaleGMN meta-update loss combines regularization and accuracy: $\mathcal L(\phi) = \lambda\, \| \hat f_\phi(G,\theta) \|_1 + \frac{1}{|\mathcal B|} \sum_{(x,y) \in \mathcal B} \ell_{\rm CE}(u_G(x; \hat f_\phi(G,\theta)), y)$ Empirical evaluations demonstrate that on small CNN and MLP zoos, single-pass ScaleGMN fine-tuning achieves higher accuracy and sparsity in orders-of-magnitude less time than standard SGD, especially in architectures with rich symmetry (large gauge group). For example: | Method | Avg Acc (%) | Max Acc (%) | Time (s) | |---------------------------|-------------|-------------|----------| | ScaleGMN-B (CNN) | 50.3 | 55.1 | 0.055 | | SGD (150 epochs) (CNN) | 44.5 | 51.4 | 357 | | ScaleGMN-B (MLP) | 35.4 | 40.7 | 0.056 | | SGD (150 epochs) (MLP) | 36.7 | 41.9 | 480 | ScaleGMNs further achieve over 80% sparsity in a single application with $L_1$ regularization, compared to max 70% for SGD after 150 epochs (Kuipers et al., 9 Oct 2025).

4. Canonicalization and Model Merging

ScaleGMN-based autoencoders facilitate neural network parameter canonicalization over the full permutation–scaling group. The encoder $E:\Theta \rightarrow \mathcal Z$ is invariant to symmetry actions, mapping all equivalent networks (under $G$ ) to a unique code, and decoder $D:\mathcal Z \rightarrow \Theta$ restores a canonical representative: $E(g \cdot \theta) = E(\theta), \quad D(E(g\cdot\theta)) = D(E(\theta)), \quad \forall g\in G$ This canonicalization enables linear interpolation (merging) in latent space: $\theta(\gamma) = D(\gamma z_A + (1-\gamma) z_B), \quad \gamma \in [0,1]$ This produces smooth loss curves, with $\theta(\gamma)$ interpolating between canonicalized networks $\theta_A$ , $\theta_B$ while avoiding loss barriers typical of naive parameter-space interpolation. Experiments demonstrate near-zero interpolation barriers for Implicit Neural Representations (INRs) and CNNs compared to sharp drops for naive or permutation-only methods. Quantitatively, INR reconstructions achieve test-MSE 0.0106 (ScaleGMN) versus 0.0135 (NeuralGraphs), and CNNs on CIFAR-10 yield $L_1$ -error in accuracy/Kendall's $\tau$ of 0.0111/0.9100 (ScaleGMN, ReLU) versus 0.0142/0.8914 (NeuralGraphs, ReLU) (Boufalis et al., 16 Nov 2025).

5. Expressivity and Theoretical Guarantees

ScaleGMNs are universal for both forward- and backward-pass simulation in feed-forward neural networks under suitable conditions. A bidirectional ScaleGMN with $T=2L$ layers and expressive ScaleEq/ScaleInv modules can embed all forward pre- and post-activations and all backpropagated gradients within node and edge features. This property is formally stated in theorems guaranteeing recovery of computational traces of arbitrary $L$ -layer networks through message-passing (Kalogeropoulos et al., 15 Jun 2024).

Furthermore, the gauge symmetry group induced by scaling is strictly smaller in convolutional neural networks compared to multi-layer perceptrons. For fully connected layers, $\mathcal G_{\mathrm{MLP}}$ is the full product of diagonal groups (dimension $m+n$ ), while convolutional weight structure restricts allowed scalings to global rather than per-channel, reducing dimension to 2. This explains performance differences observed in scale-equivariant metanetwork optimization between MLP and CNN architectures (Kuipers et al., 9 Oct 2025).

6. Graph Meta-Network Renormalization and Spectral Scaling

Beyond neural network optimization, the concept of scale in graph meta-networks extends to Laplacian renormalization of complex graphs via spectral-space renormalization group (SS-RG) transformations. In this context, a ScaleGMN refers to the meta-level network reconstructed by coarse-graining eigenmodes of the normalized graph Laplacian:

Critical exponents $d_f$ , $d_w$ , $d_s$ , $d_k$ , $y_h$ (see formulas in source) are self-consistently determined as invariant functions of the degree exponent $\gamma_d$ .
A stepwise procedure constructs supernodes, meta-edges, and a binarized Laplacian, revealing latent multi-scale structure and emergent long-range meta-links (e.g., Denmark–Spain in the European power grid).
SS-RG is non-recursive: Each coarse-graining parameter $\tau$ yields an inequivalent meta-network, highlighting crossovers and multi-scale features not visible to real-space RG (Kim et al., 10 Jul 2025).

7. Applications, Empirical Findings, and Limitations

Applications of ScaleGMNs are diverse:

Single-shot fine-tuning and pruning of deep networks (CNNs, MLPs) outperforming or matching many-step SGD in accuracy and sparsity,
Canonicalization and merging for model ensembling and weight-space navigation,
Digital twin modeling of large communication networks, where parameterizations enable generalization from small to much larger graphs (scalability validated empirically up to 300-node networks with modest error increase) (Kuipers et al., 9 Oct 2025, Boufalis et al., 16 Nov 2025, Ferriol-Galmés et al., 2021),
Multi-scale meta-graph analysis of complex networks (SS-RG perspective) (Kim et al., 10 Jul 2025).

Reported limitations include instability under ReLU scaling when employing bidirectional canonicalization, current restrictions to basic architectures (extension to transformers and normalization/residual modules remains open), and the computational burden of training amortized operators or invariant autoencoders for large models. Future directions target more robust canonicalization for continuous scaling groups, extensions to broader symmetry groups (orthogonal, rotational, channel permutation), and semi-amortized schemes integrating equivariant meta-updates with local optimization (Kuipers et al., 9 Oct 2025, Boufalis et al., 16 Nov 2025, Kalogeropoulos et al., 15 Jun 2024).

Scale Graph Metanetworks represent a theoretically grounded, symmetry-aware approach to higher-order graph neural network architectures, with strong empirical and formal justification for applications spanning neural meta-optimization, parameter space canonicalization, and multi-scale analysis of complex networks. Their strict adherence to the full permutation and scaling symmetry group enables efficiency, generalization, and robustness in scenarios where such symmetries dominate the problem structure.