Enhanced Capacity in GNN Architectures

Updated 10 October 2025

The paper introduces a novel GNN architecture that leverages high-order derivatives and random wiring to enhance expressivity and overcome traditional message-passing limitations.
The paper incorporates adaptive depth mechanisms and matrix-valued node memories to improve capacity and counteract over-squashing in complex graph structures.
The paper demonstrates empirical gains in graph classification and regression tasks, validating the architecture's enhanced efficiency, scalability, and task-specific adaptability.

A novel Graph Neural Network (GNN) architecture with improved capacity refers to a GNN design that transcends limitations of depth, information flow, representation richness, or task-specific adaptation, enabling more accurate modeling of complex graph-structured data. Improvements in capacity have manifested through architectural innovations such as deeper or more expressive layers, flexible wiring and aggregation, dynamic per-node adaptation, enhanced memory or internal storage, and extensive neural or evolutionary search over design spaces.

1. Architectures That Increase Expressivity

Several recent advances push the expressive boundaries of GNNs by modifying how information is processed at the node and graph level:

High-Order Derivative GNN (HOD-GNN) (Eitan et al., 2 Oct 2025): This architecture augments standard MPNNs with derivative tensors—high-order partial derivatives of node or graph embeddings with respect to input features. The process encodes local structure and higher-order dependencies not available to classical MPNNs. These derivative features are then encoded (via invariant or DeepSet networks) and concatenated with standard node features, with a second GNN operating on this enriched representation. HOD-GNN aligns its expressivity with the Weisfeiler-Leman hierarchy, theoretically enabling discrimination of certain non-isomorphic graphs that defeat standard message-passing schemes. Computational efficiency is achieved by propagating derivative tensors using a message-passing-like, layer-wise sparse procedure, exploiting linearity and graph sparsity.
Randomly Wired Architectures (RAN-GNN) (Valsesia et al., 2021): Departing from the traditional sequential stack of layers, RAN-GNN arranges architecture nodes (each implementing a graph convolutional layer) in a randomly sampled directed acyclic graph. Each node aggregates information from a random set of predecessors with learned scalar mixing weights. This wiring produces an effective ensemble of paths of various lengths, supporting multi-scale information integration and diversified receptive fields. The resultant model adapts pathway usage based on task demands, increasing model capacity beyond what can be gained by deeper stacking alone.
gLSTM: Matrix-Valued Node Memories (Blayney et al., 9 Oct 2025): To mitigate over-squashing (where complex, long-range information is collapsed into fixed-size vectors), gLSTM provides each node with a matrix-valued associative memory updated via outer-product key-value mechanisms (inspired by fast weight programmers and xLSTMs). Learned forget, input, and output gates (using exponential activations) control memory updates and reads, enabling storage and selective retrieval of multiple items—a capacity unattainable with vector-only representations.
NGNN: Intra-layer Modularity (Song et al., 2021): Rather than stacking more GNN layers (with their risk of over-smoothing) or simply expanding hidden dimension, NGNN improves per-layer capacity by inserting independent multi-layer perceptrons (MLPs) into each aggregation block, yielding deeper, more expressive transformations within each message-passing layer.

2. Mechanisms for Adaptive Capacity and Depth

Recent research highlights the need for flexible, node-adaptive computation, particularly when local structural or semantic requirements vary:

ADMP-GNN: Adaptive Depth Message Passing (Abbahaddou et al., 1 Sep 2025): This architecture introduces layerwise early exit mechanisms. At each message-passing layer, nodes produce predictions using a dedicated “exit” head. During inference, node assignment to exit depth is decided via clustering on structural features (degree, k-core, PageRank, Walk Count). Training employs either aggregate loss minimization or a sequential strategy (layerwise freezing and finetuning). Empirically, this adaptive approach enables nodes requiring few propagation steps (e.g., those in dense regions) to exit early, while others propagate deeper, avoiding universally imposed over-smoothing.
NDGGNET: Node-Degree-Based Gates (Tang et al., 2022): Layer contributions are modulated by per-node, per-layer gates determined as a function of node degree and other node-specific features. For highly connected (dense) nodes, the gate emphasizes historical (residual) information, countering over-smoothing; for sparsely connected nodes, the gate amplifies fresh layer-wise updates, enabling deep aggregation for information-starved regions.

3. Architecture Search and Topology Optimization

Substantial progress has occurred through automated search and design over the space of GNN topologies and feature aggregation strategies:

GNAS (Graph Neural Architecture Search) (Cai et al., 2021): GNAS decomposes message passing into fine-grained atomic operations (feature filtering and neighbor aggregation), organizing them via a layered tree-topology structure (Graph Neural Architecture Paradigm—GAP). Neural architecture search, with a differentiable relaxation of operator selection, explores combinations of filtering/aggregation at varying depths. GNAS automatically identifies architectures that balance feature selectivity and structural integration, outperforming both manual and prior search-based GNNs on regression and classification benchmarks.
F $^2$ GNN: Feature Fusion as Topology Design (Wei et al., 2021): This unifies “stacking” and “multi-path” aggregation through explicit feature selection and fusion mechanisms at each layer. The architecture forms a directed acyclic computation graph where SFA (Selection, Fusion, Aggregation) blocks adaptively combine outputs from any previous layers using operations (SUM, MEAN, MAX, CONCAT, LSTM, ATT). A differentiable neural architecture search selects the optimal sequence, depth, and topology for a target task, substantially enhancing the capacity for both local and global structure integration.
Auto-GNN / Evolutionary NAS (Zhou et al., 2019, Shi et al., 2020): Automated frameworks for discovering high-capacity GNNs use either reinforcement learning or evolutionary genetic algorithms to optimize architecture and, in the latter case, learning hyperparameters in an alternating interleaved fashion. These approaches enable navigation of vast, expressive design spaces (encompassing attention, aggregation, activation, and dimension choices), strengthening model capacity and task adaptability.

4. Empirical Gains, Benchmark Performance, and Efficiency

Experimental validation and ablation studies support the capacity claims of these architectures:

Expressivity: HOD-GNN outperforms vanilla MPNNs and even highly expressive subgraph GNNs on substructure counting, classification, and regression tasks, while RAN-GNN demonstrates lower error rates and improved learning curves relative to deep residual GNNs, especially as architectural parameterization increases.
Memory and Storage: In gLSTM, synthetic tasks such as Neighbor Associative Recall reveal that capacity-intensive memory access is required to retain and retrieve information from expanding neighborhoods; standard GCNs rapidly degrade, while gLSTM matches associative recall up to the theoretical capacity threshold.
Scalability: Binary GCNs (Bi-GCNs) (Wang et al., 2022) extend the above by compressing weights and input features to binary format, with theoretical guidelines (Entropy Cover Hypothesis) on hidden layer width required to maintain representational capacity in the compressed domain. Notably, this yields ~31x memory reduction and ~51x inference speedup while retaining competitive accuracy.
Task Adaptivity: ADMP-GNN achieves oracle-predicted performance close to 89.43% on Cora (per-node best exit), surpassing best static-depth baselines, illustrating the practical value of adaptive computation.

5. Synthesis of Sequence Modeling and GNN Paradigms

Capacity improvements have also resulted from importing principles from sequence modeling and associative memory:

Associative Memories, Fast Weights, and xLSTM Inspiration: The gLSTM model translates the core idea of matrix-valued, revisable internal memory from xLSTM, applying outer-product fast weights and gating for key-value storage/retrieval in the node state, directly countering the information bottleneck that causes over-squashing.
Sequential Training and Multi-Exit Schemes: The sequential, layerwise training procedure seen in ADMP-GNN (progressive freezing and refinement) is analogous to strategies used for training multi-exit deep convolutional networks in sequence and vision tasks.

6. Future Prospects and Theoretical Implications

Architectures that explicitly model, regulate, and optimize capacity—by leveraging higher-order derivatives, nonsequential wiring, adaptive memory, and automated architecture search—demonstrate that GNN expressivity can be extended without incurring the catastrophic pitfalls of oversmoothing or bottlenecking. This suggests a trend toward modular, adaptive, and dynamically routed GNNs that match computation and storage demands to node-, edge-, and task-level complexity.

These design philosophies are likely to support advances in domains requiring high-fidelity integration of local and global graph structure, including molecular modeling, social network inference, knowledge graph construction, and large-scale scientific computation. Emerging directions will likely address further scalability, efficient exploitation of derivative information, robustness under capacity constraints, and integration with sparse or hardware-efficient computational paradigms.

Architecture	Capacity Improvement Mechanism	Application Gain
HOD-GNN (Eitan et al., 2 Oct 2025)	High-order derivatives, structure encoding	Expressive graph classification
RAN-GNN (Valsesia et al., 2021)	Random wiring; ensemble of paths	Multi-scale receptive fields
gLSTM (Blayney et al., 9 Oct 2025)	Matrix associative memory, fast weights	Long-range, memory-bound tasks
ADMP-GNN (Abbahaddou et al., 1 Sep 2025)	Node-adaptive message-passing depth	Avoids over/under-smoothing
AGNN/Genetic-GNN (Zhou et al., 2019, Shi et al., 2020)	NAS, configuration/hyperparameter evolution	Task and graph-specific design
F $^2$ GNN (Wei et al., 2021)	Topology fusion, NAS over feature graphs	Adaptive multi-level aggregation
Bi-GCN (Wang et al., 2022)	Binarization, Entropy Cover Hypothesis	Resource-constrained, efficient