Hierarchical Graph Neural Network

Updated 26 May 2026

Hierarchical Graph Neural Networks (HGNNs) efficiently model complex data with multi-level structure, enhancing scalability.
They use layering approaches like capsule-based and community hierarchies for fine and coarse representation.
HGNNs improve expressiveness and performance across tasks like graph classification, fraud detection, and biomedical applications.

A Hierarchical Graph Neural Network (HGNN) is an architectural paradigm for graph-based machine learning that explicitly models and exploits multi-level or hierarchical structure inherent in complex data domains. In contrast to traditional (“flat”) Graph Neural Networks (GNNs) that perform message passing over a single-layer graph, HGNNs organize computation across multiple layers of abstraction, recursively extracting representations at varying scales—ranging from fine-grained entities (e.g., individual nodes or subgraphs) to progressively coarser “supernodes” (e.g., communities, functional groups, capsules, or semantic units), and potentially up to graph-level summaries. This approach enables efficient modeling of both local and long-range interactions, supports computational scalability, and can inject prior knowledge from predefined or learned hierarchical relations.

1. Hierarchical GNN Architectures

Various HGNN architectures have been proposed, each reflecting a different strategy for constructing and propagating information through multilevel abstractions:

Explicit Layered Hierarchies: Classical HGNNs supplement the original input graph with a hierarchy of auxiliary network layers wherein each layer represents a coarser abstraction (e.g., community contraction, cluster pooling). Computation proceeds both horizontally (within a resolution) and vertically (across resolutions). Node features are updated via both intra-layer message passing and cross-layer aggregation/disaggregation (Sobolevsky, 2021).
Capsule-Based Hierarchies: The hierarchical graph capsule network (HGCN) augments GNNs with layers of capsules at successively coarser scales. Each node is encoded by a disentangled set of capsule factors whose parameters are hierarchically routed to higher-level capsules, using iterative routing-by-agreement and part–whole relationships, mirroring the organization found in capsule networks for images (Yang et al., 2020).
Supergraph and Community Hierarchies: Community-aware HGNNs organize nodes into recursively built supergraphs via unsupervised community detection (e.g., Louvain method), then employ dedicated message-passing flows at and between each level. This design enables O(log n) communication range with computational cost comparable to deep single-level GNNs (Zhong et al., 2020, Rampášek et al., 2021).
Node/Pattern Individualization: Hierarchical Ego-GNNs (HE-GNN) generalize subgraph-based GNNs by alternating “individualization” (i.e., focusing on a node or subgraph of interest) with standard message passing, forming a strict hierarchy with provable gains in graph isomorphism separation power (Soeteman et al., 16 Jun 2025).
Multimodal/Multiscale Hierarchies: In domains where data naturally possesses multiple modalities or scales (e.g., cells↦tissues in histopathology, patients↦cohort in biomedical studies), HGNNs integrate level-specific GNNs (e.g., cell-level, tissue-level, patient-level) whose representations are fused through explicit assignment maps or consistency losses (Pati et al., 2020, Adam et al., 10 Jul 2025).

2. Computational and Message-Passing Principles

HGNNs extend the message-passing paradigm to explicitly propagate information both:

Within-level: Each layer performs standard GNN updates (e.g., GCN, GAT, GIN) among nodes/supernodes using local adjacency.
Between levels (Vertical Propagation): Nodes at one level send or receive messages from corresponding nodes at adjacent (coarser or finer) layers via aggregation and disaggregation (e.g., pooling, assignment matrices, attention).
Hierarchy Construction: Most HGNNs either (i) predefine the hierarchical structure via domain knowledge (e.g., sector→industry→stock in finance (Yao et al., 2024)), (ii) use unsupervised clustering (e.g., Louvain, SLIC superpixels), (iii) employ learnable pooling mechanisms (DiffPool, EdgePool, stochastic forests), or (iv) learn soft assignments through attention and/or disentanglement objectives (Yang et al., 2020, Cui et al., 26 Sep 2025, Pati et al., 2020).
Routing, Clustering, and Coarsening: HGCN relies on capsule voting and routing to establish how fine-level instantiations are aggregated into coarser capsules, where routing coefficients induce coarsened graph adjacency for the next layer (Yang et al., 2020). Community or cluster-based HGNNs contract subnetworks into supernodes, propagating pooled features upward and distributing contextualized global features downward (Sobolevsky, 2021, Rampášek et al., 2021).

3. Mathematical Formulations and Optimization

The mathematical underpinnings of HGNNs are diverse, reflecting their layered structures:

Hierarchical Update Equations: Let xᵏ(a) denote the feature of node a at layer k (resolution k). The general update is:

$x_{i+1}^{k}(a) = f^k( x_{i}^{k}(a), \, \sum_{b \in L^k} \tilde{A}^k(a,b)x_{i}^{k}(b), \, \sum_{b \in L^{k-1}} H^{k-1 \rightarrow k}(b,a)x_{i}^{k-1}(b), \, \sum_{b \in L^{k+1}} H^{k+1 \rightarrow k}(b,a)x_{i}^{k+1}(b) )$

where $\tilde{A}^k$ is the normalized adjacency at level k and $H^{\cdot \rightarrow \cdot}$ are inter-level assignment matrices (Sobolevsky, 2021).

Capsule Instantiation and Routing: In HGCN, disentangled graph capsules are computed per node:

$z_{i,k} = \sigma(W_k^\top x_i + b_k)\,, \quad z_i = [z_{i,1} ; \ldots ; z_{i,K}]$

$u_i^{(1)} = \mathrm{squash}(z_i) = \frac{\|z_i\|^2}{1+\|z_i\|^2} \cdot \frac{z_i}{\|z_i\|}$

Hierarchical routing then clusters lower-level votes $v_{j|i}^{(\ell)}$ into coarser capsules via softmax assignment and squashing nonlinearity (Yang et al., 2020).

Hierarchy-Aware Losses: Objectives integrate level-specific losses (e.g., margin loss on class capsules, auxiliary reconstruction), consistency losses aligning fine/coarse embeddings, and, when relevant, entropy or regularization terms to promote disentanglement (Yang et al., 2020, Adam et al., 10 Jul 2025, Pati et al., 2020).

4. Model Variants and Domain Adaptations

HGNNs have been applied and tailored to diverse domains:

Text and NLP: Hierarchical modeling of word–sentence–document graphs using separate GNNs per scale, with fusion via learned or rule-based weighting. Coding-tree based approaches utilize hierarchical clustering via structural entropy minimization (Zhang et al., 2021, Hua et al., 2022).
Biomedicine and Histopathology: Hierarchical architectures model molecular/interactome graphs at the patient level, then aggregate into cohort-wide relational graphs. In image analysis, cell–superpixel (cell–tissue) hierarchies use sequential GINs/GCNs combined at the assignment boundaries (Pati et al., 2020, Adam et al., 10 Jul 2025).
Program Analysis and Reverse Engineering: Cross-architectural binary function analysis is addressed using graph-of-graphs (control-flow subgraphs within function-call graphs), with hierarchical two-stage message passing and siamese loss for identification tasks (Yu et al., 2023).
Fraud Detection and Recommendation: Hierarchical attention mechanisms stack relation-level attention with neighborhood-level and information-fusion modules to detect anomalous behaviors (Liu et al., 2022). Multilevel user modeling decomposes sequential user–item graphs into “factor” representations through soft clustering and timespan-aware affinity graphs (Xue et al., 2022).
Network Science, Mobility, and Social Networks: HGNNs constructed via community detection or randomized forest decompositions reduce the required number of message passing steps to O(log n), supporting tractable long-range representation learning on large graphs (Rampášek et al., 2021, Cui et al., 26 Sep 2025, Sobolevsky, 2021).

5. Expressiveness and Theoretical Properties

HGNN architectures can strictly enhance expressive power relative to flat GNNs:

Expressivity Hierarchies: The D–L hierarchy classifies GNN architectures by their aggregation regions (Dₖ: k-hop neighborhood, Lₖ: k-hop with shell edges), with strict increases in power as k increases; certain hierarchical GNNs can exceed the Weisfeiler–Lehman test’s graph discrimination ability (Li et al., 2019, Soeteman et al., 16 Jun 2025).
Logical Characterization: HE-GNNs correspond to graded hybrid logics GML(↓), with depth-d HE-GNN capturing d binder nestings. These hierarchies interpolate between standard 1-WL GNNs and higher-order logic-based models, achieving full graph-isomorphism separation at sufficient depth (Soeteman et al., 16 Jun 2025).
Communication Range: Hierarchical coarsening or capsule-based pooling enables O(log n) communication paths, allowing networks to efficiently encode long-range dependencies and global context (Rampášek et al., 2021, Zhong et al., 2020).

6. Empirical Results and Application Benchmarks

HGNNs demonstrate strong and often superior empirical performance in a broad spectrum of tasks and domains:

Model/Paper	Domain	Reported Gains/Key Metrics
HGCN (Yang et al., 2020)	Graph classification	Best accuracy on 10/11 benchmarks; +16–20% on ENZYMES over baseline capsule GNNs
Athena (Adam et al., 10 Jul 2025)	Biomedicine	AUC: 0.747 → 0.812 (+13% rel.), F1: +20% rel. vs. non-graph models
CFG2VEC (Yu et al., 2023)	Reverse Engineering	Precision@1: +24.6% vs. Debin; 97% with 4-arch data
HA-GNN (Liu et al., 2022)	Fraud Detection	AUC: YelpChi 85.67% (baseline CARE-GNN: 75.70%)
HC-GNN (Zhong et al., 2020), HGNet (Rampášek et al., 2021)	Node/graph classification	Consistent 1–16% micro-F1 gain, robust to sparsity
SHAKE-GNN (Cui et al., 26 Sep 2025)	Large-graph classification	97–99% of flat GNN accuracy with ~50% reduced training time

Ablation studies consistently confirm that removal of key hierarchical components (e.g., capsule disentanglement, cross-level connections, hierarchical fusion) degrades both accuracy and robustness, supporting the architectural necessity of hierarchy.

7. Extensions, Limitations, and Open Directions

Key areas for ongoing investigation include:

Learned vs. Fixed Hierarchies: End-to-end learning of partition/assignment matrices (DiffPool, attention-based clustering) as alternatives to predefined or unsupervised hierarchies can enhance adaptability but may introduce additional optimization challenges (Sobolevsky, 2021, Cui et al., 26 Sep 2025).
Scalability: Stochastic or spectral approaches (e.g., Kirchhoff Forests (Cui et al., 26 Sep 2025)) offer significantly improved scalability for massive graphs while maintaining accuracy.
Interpretability: Advances in explainable HGNNs (e.g., subnetwork extraction, cross-level attribution) facilitate mechanistic understanding of predictions, particularly in biomedicine (Adam et al., 10 Jul 2025).
Flexible Modality Integration: Hierarchical frameworks naturally support multimodal and multi-relational data, enabling improvements for disease classification, recommendation, and cross-domain matching (Adam et al., 10 Jul 2025, Taghibakhshi et al., 2023).
Limitations: Increased complexity, hyperparameter sensitivity (hierarchy depth, capsule count), computational overhead for hierarchy construction, and the potential for overfitting in parameter-rich designs are active concerns.
Future Directions: Stacking deeper or adaptive hierarchies, integrating transformer-based sequence models, jointly learning hierarchical graph structures, and developing theoretical foundations relating logic, expressivity, and generalization remain central open problems.

For an extensive treatment of underlying methodologies and empirical evidence, see "Hierarchical Graph Capsule Network" (Yang et al., 2020), "Stock Type Prediction Model Based on Hierarchical Graph Neural Network" (Yao et al., 2024), "CFG2VEC: Hierarchical Graph Neural Network for Cross-Architectural Software Reverse Engineering" (Yu et al., 2023), "Hierarchical information matters: Text classification via tree based graph neural network" (Zhang et al., 2021), "Atherosclerosis through Hierarchical Explainable Neural Network Analysis" (Adam et al., 10 Jul 2025), "Deep Multi-Task Augmented Feature Learning via Hierarchical Graph Neural Network" (Guo et al., 2020), "Improving Fraud Detection via Hierarchical Attention-based Graph Neural Network" (Liu et al., 2022), and related works cited above.