Papers
Topics
Authors
Recent
2000 character limit reached

Over-Squashing Bound in GNNs

Updated 3 October 2025
  • Over-squashing bound is a measure quantifying how distant node features exponentially lose influence via Jacobian sensitivity decay in message passing GNNs.
  • It is characterized by structural bottlenecks where exponential growth in k-hop neighbors meets narrow connectivity, drastically attenuating signals.
  • Curvature-guided rewiring methods like SDRF surgically alleviate these bottlenecks, improving long-range dependency capture in graph neural networks.

Over-squashing is a phenomenon in graph neural networks (GNNs) in which information from nodes that are distant in the input graph becomes exponentially compressed during message passing, resulting in vanishing sensitivity of node representations to remote input features. This exponential decay of influence occurs as messages from an exponentially expanding set of kk-hop neighbors are aggregated and routed through structural bottlenecks in the graph, particularly in settings with limited connectivity. Over-squashing fundamentally constrains the expressive power of GNNs for tasks that require integrating long-range or global information.

1. Formal Definition and Jacobian Bound

Over-squashing is formally measured by the sensitivity of a node’s hidden representation to distant inputs, given by the Jacobian hi()xs\left|\frac{\partial h_i^{(\ell)}}{\partial x_s}\right|. For a message passing GNN layer,

hi(+1)=ϕ(hi(),jA^ijψ(hi(),hj()))h_i^{(\ell+1)} = \phi_\ell \left( h_i^{(\ell)}, \sum_j \hat{A}_{ij} \psi_\ell(h_i^{(\ell)}, h_j^{(\ell)}) \right)

if ϕ\phi_\ell and ψ\psi_\ell have bounded derivatives α\alpha and β\beta respectively, then for nodes ii and ss at distance r+1r+1:

hi(r+1)xs(αβ)r+1(A^r+1)is\left| \frac{\partial h_i^{(r+1)}}{\partial x_s} \right| \leq (\alpha \beta)^{r+1} \cdot \left( \hat{A}^{r+1} \right)_{is}

where A^\hat{A} is the normalized adjacency matrix with self-loops. In graphs where the number of kk-hop neighbors grows rapidly with kk (such as trees or small-world graphs), (A^r+1)is\left( \hat{A}^{r+1} \right)_{is} decays extremely quickly, establishing a quantitative upper bound for over-squashing: the influence of distant node features on the target node representation becomes exponentially small as rr increases.

This theoretical bound is the foundation for recent analyses of over-squashing in message passing GNNs (Topping et al., 2021).

2. Structural Bottlenecks and k-hop Neighborhoods

Graph bottlenecks arise when very few edges connect large subgraphs or communities, forcing many long-range messages to be channeled through a small set of “bottleneck” edges. Even if a node has access to an exponentially increasing set of rr-hop neighbors, the aggregation pathway is restricted by these narrow connecting regions. As a result, the aggregated message must compress information from an exponentially sized set of distant nodes, inherently limiting the capacity available for long-range dependencies.

The magnitude of (A^r+1)is\left( \hat{A}^{r+1} \right)_{is} reflects how well node ii is connected to node ss within r+1r+1 steps, and it can become vanishingly small in the presence of severe bottlenecks (e.g., tree-like structures or “dumbbell” graphs), which are precisely the topological configurations most impacted by over-squashing.

3. Edge-Based Combinatorial Curvature and Bottlenecks

To diagnose and attribute over-squashing to specific regions of the graph, the Balanced Forman curvature Ric(i,j)\text{Ric}(i, j) was introduced as an edge-based combinatorial analog of Ricci curvature. For unweighted graphs, this curvature depends on the degrees of ii and jj, the number of triangles (shared neighbors), and terms counting 4-cycles. High curvature (positive values) corresponds to well-connected edges (in triangles, cycles), while highly negatively curved edges identify bridge-like bottlenecks connecting otherwise sparsely linked components.

Theoretical results demonstrate that:

  • Any edge with Ric(i,j)2+δ\text{Ric}(i,j)\leq -2+\delta for sufficiently small δ\delta is associated with a bottleneck,
  • The Jacobian (sensitivity) of node features with respect to distant inputs sharply decreases across bottleneck edges, with the decay controlled in part by the amount of negative curvature.

Curvature thus yields a precise graph-theoretic signal for identifying and potentially repairing sources of over-squashing.

4. Curvature-Guided Graph Rewiring Algorithms

Building on the link between negative curvature and over-squashing, a curvature-guided rewiring scheme was proposed: Stochastic Discrete Ricci Flow (SDRF) (Topping et al., 2021). SDRF proceeds by:

  • Identifying the edge with minimum (most negative) Balanced Forman curvature,
  • Proposing new candidate edges (within one-hop neighborhoods) that would most increase the curvature,
  • Stochastically selecting (via a softmax distribution, temperature τ\tau) among these candidates to add to the graph,
  • Optionally removing the edge with largest positive curvature, maintaining a structural balance.

This approach “surgically” opens up bottlenecks—adding edges only where necessary to alleviate over-squashing—without indiscriminately densifying the graph, and with minimal disruption to global topology. Empirically, SDRF improves accuracy on benchmark node classification tasks, particularly on graphs with low homophily requiring integration of long-range signals.

5. Mathematical Theorems and Cheeger-Type Bounds

The central mathematical results are:

  • Sensitivity Bound: For nodes at distance r+1r+1,

hi(r+1)xs(αβ)r+1(A^r+1)is\left| \frac{\partial h_i^{(r+1)}}{\partial x_s} \right| \leq (\alpha\beta)^{r+1} (\hat{A}^{r+1})_{is}

showing exponential decay of sensitivity.

  • Curvature-Jacobian Bound (Theorem 3): For an edge with curvature 2+δ\leq -2+\delta there exists a two-hop neighborhood QjQ_j where the average sensitivity is upper bounded by (αβ)2δ1/4(\alpha\beta)^2 \delta^{1/4}, formalizing the role of negatively curved edges as bottlenecks.
  • Curvature vs. Cheeger Constant: If all edges satisfy Ric(i,j)k>0\text{Ric}(i, j) \geq k > 0, then the Cheeger constant hGk/2h_G \geq k/2, linking positive curvature to expansion and small bottlenecks, and thus reduced over-squashing.

6. Broader Implications and Future Directions

Curvature-based diagnostics provide both interpretable topological measurements for over-squashing and actionable levers (via rewiring) to locally improve information flow in GNNs. Unlike global diffusion-based approaches, surgical curvature-guided rewiring allows for principled intervention without unnecessary edge addition, preserving the essential semantic structure of the input graph.

Limitations of the current framework include its restriction to unweighted graphs; further generalization to weighted graphs, multigraphs, and feature-dependent curvature metrics remains open. Additionally, the trade-off between topology preservation and optimal signal propagation, as well as integration with new GNN architectures, represents ongoing areas of research.

Ultimately, the over-squashing bound, as formalized through Jacobian sensitivity decay rates and curvature-based bottleneck detection, has deepened theoretical understanding of expressivity in GNNs. It provides a rigorous foundation for designing deeper, more expressive architectures capable of capturing long-range dependencies by precisely targeting and repairing the specific topological flaws most responsible for information attenuation (Topping et al., 2021).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Over-Squashing Bound.