Cooperative Sheaf Neural Networks (CSNNs)
- CSNNs are graph neural networks that leverage cellular sheaf theory to enable directed, cooperative communication among nodes.
- They allow nodes to independently control broadcasting or listening, thereby mitigating oversquashing and supporting long-range dependency modeling.
- Empirical results show CSNNs achieve state-of-the-art performance on both synthetic bottleneck tasks and real-world heterophilic graphs.
Cooperative Sheaf Neural Networks (CSNNs) are a class of graph neural networks (GNNs) that systematically extend sheaf-based diffusion and message passing to directed graphs while introducing cooperative communication actions at the node level. By integrating concepts from homological algebra, particularly cellular sheaves, with recent advancements in cooperative message passing, CSNNs allow each node to independently regulate whether it broadcasts or gathers information from its neighbors. This architecture provides precise control over information flow, long-range dependency modeling, and mitigation of the oversquashing phenomenon in graph learning. CSNNs generalize earlier sheaf neural network frameworks to directed graphs and demonstrate state-of-the-art performance on synthetic bottleneck tasks and real-world heterophilic graphs (Ribeiro et al., 1 Jul 2025).
1. Background: Sheaf Neural Networks and Graph Diffusion
Sheaf Neural Networks (SheafNNs) generalize standard graph convolutional networks (GCNs) by replacing scalar-valued Laplacian-based diffusion on graphs with richer, stalk-valued diffusions parameterized by cellular sheaves (Hansen et al., 2020). In SheafNNs, each node and edge is associated with an ambient vector space (the “stalk”). Restriction maps control how signal assignments propagate between these stalks, and the sheaf Laplacian orchestrates global diffusion.
This approach naturally encodes complex, nonconstant, asymmetric, or signed relationships, outperforming GCNs where such edge heterogeneity is intrinsic. However, prior to CSNNs, this machinery applied only to undirected graphs and lacked explicit node-level cooperation mechanisms (Hansen et al., 2020).
2. Cellular Sheaves for Directed Graphs
CSNNs introduce the first formulation of cellular sheaves over directed graphs designed for cooperative information routing. In the CSNN schema, to each node and each directed edge is attached a feature space , . For each incidence (where is the source) and (where is the target), linear restriction maps are defined:
- 0
- 1
General directed sheaves involve a large parameter set. For tractability, CSNNs employ a “flat-bundle” specialization: at each node 2, all source restriction maps share a conformal matrix 3, and all target maps share 4, yielding two per-node 5 learnable maps.
This construction directly addresses the central limitation of non-cooperative sheaf diffusions: the lack of message directionality and selective action (Ribeiro et al., 1 Jul 2025).
3. Directed Sheaf Laplacians and Cooperative Message Passing
Extending the classical coboundary operator and Laplacian to directed graphs, CSNNs define two non-symmetric Laplacians:
- The out-degree sheaf Laplacian
6
- The in-degree sheaf Laplacian
7
In the flat-bundle case where restriction maps decompose via per-node 8 (source) and 9 (target), these Laplacians simplify to expressions depending only on 0 across neighborhoods.
A key architectural choice in CSNNs is to perform message passing via a composition operator 1, discretizing a directed version of the heat equation. Learnable local maps enable each node to “turn off” broadcasting, gathering, both, or neither, through zeroing 2 (disabling outgoing messages) and/or 3 (disabling incoming messages). This operational flexibility underlies the cooperative architecture (Ribeiro et al., 1 Jul 2025).
4. Layer Structure, Update Mechanism, and Action Modes
CSNN layers operate by iteratively learning nodewise source and target maps and propagating signals via normalized Laplacian compositions. At each layer 4:
- Compute new conformal maps 5, 6 for each node (via local networks or Householder layers)
- Construct and normalize Laplacians 7, 8
- Update node features according to
9
where 0 is a learnable self-loop vector, 1 is a pointwise nonlinearity, 2 are weight matrices.
Nodes thereby independently choose among four action modes per layer:
- Standard: 3 nonzero (both broadcast and listen)
- Listen-only: 4, 5
- Broadcast-only: 6, 7
- Isolate: 8
Sparsification of communication, both spatially and along a path, results from selective activation of these maps. Nodes can attend to or ignore information along arbitrarily chosen routes (Ribeiro et al., 1 Jul 2025).
5. Theoretical Properties: Receptive Field, Oversquashing, and Selectivity
The CSNN model supports several provable properties relevant to graph signal propagation and bottleneck phenomena:
- Cooperative Property: Writing 9, if 0 for some node 1, then 2—node 3 listens to nobody. If 4 for 5, then 6 is independent of 7—8 broadcasts to nobody.
- 2t-hop Reachability: After 9 layers, a node’s updated feature can depend on nodes at directed distance up to 0, confirmed by induction on the iterated composition structure.
- Selective Long-range Attention: For nodes 1 and 2 separated by directed distance 3, the network can learn maps such that 4 receives information exclusively from 5 in 6 layer updates, with all intervening nodes and paths inhibited by zeroing corresponding 7 or 8. This construction enables path-specific targeted routing and mitigates oversquashing prevalent in standard GNNs (Ribeiro et al., 1 Jul 2025).
6. Empirical Results and Comparative Benchmarks
CSNNs have been evaluated on both synthetic tasks designed to probe oversquashing and on real-world heterophilic node-labeling benchmarks:
- Oversquashing and Signal Transmission: On NeighborsMatch (binary tree root classification), CSNNs achieve 100% accuracy up to depth 9, while GCN, GIN, GAT, GGNN baselines deteriorate for 0–5. This result establishes the model’s ability to preserve long-range dependencies without feature compression (Ribeiro et al., 1 Jul 2025).
- Node Classification on Heterophilic Graphs: Across multiple datasets (Roman-empire, Amazon-ratings, Minesweeper, Tolokers, Questions), CSNNs demonstrate superior or best-in-class performance (accuracy/AUC), often by margins of 1–5 points compared to both NSD and Bundle NN, as summarized below:
| Dataset | Best CSNN Score | Best Baseline | Baseline Model |
|---|---|---|---|
| Minesweeper | 99.07% ROC-AUC | 98.99% | BuNN |
| Roman-empire | 92.63% Accuracy | 91.75% | BuNN |
| Tolokers | 85.45% ROC-AUC | 84.78% | BuNN |
| Questions | 79.31% ROC-AUC | 78.75% | BuNN |
- Ablation—Directed Sheaf Network (DSN): Replacing per-node conformal maps with orthogonal bundles (no action flexibility) leads to a 0.3–1.3 point performance drop on four out of five benchmarks, underscoring the necessity of node-level source/target adaptivity (Ribeiro et al., 1 Jul 2025).
7. Context: Relation to Prior SheafNNs and Directions for Extension
Earlier SheafNNs operated on undirected graphs using sheaf Laplacians to enable stalk-based message passing and were able to outperform GCNs when edge relations were nonconstant, signed, or asymmetric (Hansen et al., 2020). However, these architectures did not provide directional, node-selective, or pathwise control, which limited their ability to mimic truly cooperative multi-agent communication protocols.
CSNNs extend this by:
- Introducing directed graph sheaf Laplacians and composition-based architectures.
- Enabling independent nodewise broadcast/listen controls.
- Supporting targeted multi-hop communication and mitigating aggregation bottlenecks.
- Achieving stronger performance on tasks where standard aggregation-based graph neural approaches, regardless of advanced attention or pooling, are fundamentally constrained by oversquashing and feature dilution.
A plausible implication is that the integration of directionality, node-level action, and sheaf theory establishes a new class of architectures for structured cooperative learning over arbitrary data graphs, especially in heterophilic and complex real-world domains. The flat-bundle specialization balances model expressivity and tractability; further research could explore richer sheaf parameterizations, adaptive layer morphisms, or multi-sheaf ensembles to enhance flexibility and scalability (Ribeiro et al., 1 Jul 2025, Hansen et al., 2020).