Graph Neural Network Architecture

Updated 24 September 2025

Graph neural network architecture is a framework that adapts convolutional operations to irregular, non-Euclidean graphs using graph signal processing operators.
It employs specialized pooling and downsampling techniques based on graph topology to achieve deep, hierarchical feature extraction.
Advanced designs like selection and aggregation GNNs demonstrate high accuracy in tasks such as classification and authorship attribution while preserving graph-specific properties.

Graph neural network architecture refers to the set of design principles, mathematical formulations, and practical layer constructions enabling neural networks to process signals supported on arbitrary graphs. These architectures generalize convolutional neural networks (CNNs) from regular domains (e.g., images, time series) to irregular, non-Euclidean domains, leveraging the structure of the underlying graph for feature learning, signal transformation, and hierarchical representation.

1. Foundations and Motivations

Traditional CNNs exploit spatial or temporal regularity through linear time-invariant (LTI) filters and local pooling, producing translation-equivariant representations. However, many signals—such as those on social, citation, biological, or sensor networks—are naturally supported on graphs where "neighbor" and "neighborhood" are defined by the graph’s adjacency structure, not by a regular grid or sequence.

Graph neural network (GNN) architectures are constructed to:

Replace LTI convolution with graph signal processing operators (e.g., linear shift-invariant [LSI] graph filters);
Adapt pooling and downsampling to the connectivity of the graph;
Enable deep, hierarchical feature extraction analogous to deep CNNs while addressing the challenges of irregular support and non-uniform neighborhood sizes;
Preserve graph-specific properties such as equivariance to node permutations and stability to small changes in graph topology.

These principles motivate novel architectural components that extend beyond the classic aggregation of node features.

2. Graph Filters and Convolutional Layers

The convolutional layer in a graph neural network replaces the regular grid-based convolution with a linear shift-invariant filter acting on graph signals. For a graph with shift operator $S$ (often the adjacency matrix or a Laplacian variant), a graph signal $x$ is filtered as a polynomial of $S$ :

$y = \sum_{k=0}^{K-1} h_k S^k x$

Here, the coefficients $h_k$ are learnable filter parameters, and $S^k$ diffuses the signal $k$ hops away in the graph, capturing local or global structure depending on $K$ . In the special case when $S$ is circulant (e.g., a time signal), this reduces to the standard convolution.

Some architectures, such as the selection GNN (Gama et al., 2018), implement this as:

$_x_1^{(fg)} = \sum_{k=0}^{K_1-1} [h_1^{(fg)}]_k S^k x_0^{(g)}$

Enabling direct extension of LTI filters to graphs. More complex variants incorporate multidimensional edge features—operating not just on node features but also on edge attributes, using learnable functions over combined vertex and edge information (Gadiya et al., 2018):

$\mathcal{A}_{\text{out}}(i, j, :) = \phi\big(W \cdot [\mathcal{A}_{\text{in}}(i, j, :)\ ;\ V_{\text{in}}(i, :)\ ;\ V_{\text{in}}(j, :)]\big)$

This explicit edge convolution leads to better exploitation of rich, multi-relational graph data.

3. Pooling, Downsampling, and Hierarchical Composition

Pooling in GNNs cannot rely on regular subsampling; instead, nodes are aggregated according to graph topology. The selection GNN interprets pooling as two steps (Gama et al., 2018):

Neighborhood summarization:

$[x_l^{(f)}]_n = \rho_l([h_l^{(f)}]_{N_l})$

where $N_l = \{ m : [S^{(k)}]_{nm} \neq 0\ \text{for some}\ k \leq \alpha_l \}$ and $\rho_l$ is a summary function (max, mean, etc.).

Downsampling via a sampling matrix, selecting a node subset for the next layer.

To retain the mapping between subsampled nodes and graph structure in deeper layers, selection GNNs pad sampled features back to full graph size using a binary sampling matrix ( $S_l^T x_l^g$ ). This "bookkeeping" enables further convolutions without losing spatial correspondence. More generally, hierarchical architectures may be built using bipartite convolutions, explicitly decoupling input and output nodes in each layer (Nassar, 2018). This design supports coarsened (strided) convolutions, expansions (unpooling), and multi-scale information fusion.

4. Alternative Architectural Paradigms: Aggregation and Diffusion

Aggregation GNNs introduce a fundamentally different paradigm for enabling classical CNN operations on graphs (Gama et al., 2018). Instead of direct convolution, the input signal is diffused by successive multiplications with $S$ :

$x^0 = x, \quad x^1 = Sx, \quad x^2 = S^2 x, \ldots, x^{N-1} = S^{N-1}x$

At a designated node $p$ , the sequence $[x^0]_p, [x^1]_p, \ldots, [x^{N-1}]_p$ is stacked into a vector, forming a "temporal" representation. Standard 1D CNNs can be applied to this sequence, with the temporal structure encoding neighborhood-expansion around $p$ (each point in the sequence encoding $k$ -hop aggregated context). For large graphs, multinode aggregation generalizes this to a set of nodes, each collecting local diffusion tracks, processed by local CNNs then recombined across the graph.

This procedure both enables processing with familiar CNN modules and provides a novel way to aggregate multi-scale structural information.

5. Comparative Performance and Applications

Empirical evaluation of these architectures demonstrates their utility across diverse domains (Gama et al., 2018):

Source localization on stochastic block models and real Facebook networks: Multinode aggregation GNNs with spectral-proxy node selection achieve up to 97.3% (SBM) and 99.0% (Facebook) classification accuracy, outperforming graph coarsening approaches.
Authorship attribution (word adjacency networks): Aggregation GNN variants yield superior performance (≈80.5% accuracy) over selection and graph coarsening methods.
Text categorization (20NEWS, 1000-node word graph with word2vec edges): The multinode aggregation GNN achieves ≈67.0% accuracy, consistently outperforming both selection GNNs and alternatives.

These results illustrate that both architectures are proper generalizations of CNNs—when applied to circulant graphs, they reduce to the convolution and pooling of standard CNNs. Notably, the careful retention of node position in selection GNNs and the temporal structuring induced by graph diffusion in aggregation GNNs are important for stacking layers and for harnessing hierarchical or multi-scale graph structure.

6. Specialization, Generalization, and Theoretical Insights

Both main paradigms fit within a broader scope of theoretical and practical GNN design:

Structural generality: Both selection and aggregation GNNs reduce to CNN architectures on regular graphs and extend naturally to arbitrary topologies.
Architectures like BiGraphNet formalize hierarchical and bipartite relationships, enabling efficient multi-scale computation (Nassar, 2018).
Feature augmentation: Modern GNNs may integrate edge features, higher-order relations (factor graphs (Zhang et al., 2019)), or even direct NAS-based optimization to discover novel architectures suitable for specialized domains (Gao et al., 2019, Zhao et al., 2020, Wang et al., 26 Nov 2024).
Expressiveness: Hierarchies based on the nature of aggregation regions (e.g., "walk-based" subgraph aggregation (Li et al., 2019)) allow constructing architectures of provably greater discriminative power than first-order message-passing GNNs.

7. Deployment Considerations and Future Directions

Architecture design choices have direct implications for numerical stability, scalability, and deployment:

Memory and computational costs can be mitigated by pooling/padding strategies (selection GNNs), bipartite convolutions, or local aggregation (multinode aggregation GNNs).
The choice of base operator (adjacency, Laplacian, normalized variants) and filter depth ( $K$ ) governs expressivity and resource trade-offs.
In large-scale graphs, multinode aggregation reduces communication and computational burden while retaining local and global context.
Generalization to edge features, variable-size graphs, and hierarchical decompositions provides pathways for extending these architectures to temporal, multi-layer, or heterogeneous graphs.

In summary, the landscape of graph neural network architectures is characterized by rigorous mathematical grounding in graph signal processing, flexible hierarchical and aggregation designs, and strong empirical performance across a spectrum of graph-structured problems. Recent advances prioritize not only expressive power and accuracy but also scalable, efficient computation and adaptability to real-world data heterogeneity.