Over-Smoothing Decay Rate in GNNs
- Over-smoothing decay rate is a measure quantifying the exponential convergence of GNN node features to a homogeneous limit, driven by spectral parameters and graph topology.
- It is characterized by the decay of feature differences according to sub-dominant eigenvalues and metrics like Dirichlet energy, revealing the effects of graph structure and operator norms.
- Control strategies like residual connections, edge dropping, and nonlocal propagation techniques are effective in mitigating over-smoothing and preserving discriminative power.
Over-smoothing decay rate quantifies the exponential rate at which node feature differences or higher-order nontrivial directions collapse toward indistinguishability in deep graph neural networks (GNNs). In both theoretical analysis and empirical investigation, this rate emerges as a precise function of spectral or structural parameters of the propagation operator, feature transformations, and graph topology. Over-smoothing is thus directly characterized by how embeddings converge toward a low-rank or homogeneous limit as the number of layers increases.
1. Formal Definitions and Operator-Theoretic Foundations
Over-smoothing in GNNs refers to the phenomenon where, as layers are stacked, the representations of nodes become increasingly similar, often collapsing to a subspace with limited discriminative power. The canonical formalization is as follows:
- Graph convolutional update:
where is the normalized adjacency or aggregation matrix and a learnable feature transform.
- Power iteration perspective:
When (or orthogonal), . Vectorizing yields with , and the model implements power iteration by (Roth, 2024).
- Exponential decay rate:
The convergence rate of toward the limit is governed by the sub-dominant eigenvalues: , where is the asymptotic rank-one limit.
- Rank collapse:
More generally, a sequence "rank-collapses" if for rank-one , capturing both over-smoothing and over-correlation (Roth, 2024, Roth et al., 2023).
2. Spectral Parameters and Universal Exponential Laws
The decay rate depends on the spectral properties of the propagation operator. Let , , or denote the relevant matrix (normalized adjacency, Laplacian, or random walk operator):
- Graph Laplacian and Dirichlet energy decay:
For continuous-time diffusion , the Dirichlet energy satisfies , where is the first non-trivial eigenvalue of (Guan et al., 7 Dec 2025, Shao et al., 2023).
- Discrete propagation:
For , with the largest Laplacian eigenvalue; the decay factor is always strictly less than $1$ on non-bipartite graphs (Guan et al., 7 Dec 2025).
- Normalized adjacency and general GNN:
For linear propagation (), if is the supremum of layer operator norms and is the second-largest eigenvalue modulus of , , with and quantifying distance from the smoothed subspace (Sun, 2022, Huang et al., 2020).
- Feature difference decay:
For regular graphs or strong curvature (see below), , with depending on overlap (curvature-based bounds) or on (Nguyen et al., 2022).
3. Metrics: Dirichlet Energy, High-Frequency Content, and Rank-One Distance
Several metrics precisely characterize the collapse rate:
| Metric Name | Mathematical Definition | Collapse Rate |
|---|---|---|
| Dirichlet Energy () | , | |
| High-Frequency Energy | , projects onto high eigenspaces | |
| Rank-One Distance (ROD) | for max-norm (row , column ) | , |
- Empirically, all these metrics show clean exponential decay on standard GNNs, with the observed exponent matching the theoretical spectral rate (Roth, 2024, Roth et al., 2023, Yang et al., 2024).
- Residual connections, normalization schemes, or explicit anti-collapse measures can alter or arrest this decay, preventing over-smoothing (Roth, 2024, Huang et al., 2020).
4. Structural and Geometric Factors Affecting Decay
The spectral gap, graph topology, normalization, and even graph curvature control over-smoothing rates:
- Spectral gap ():
Larger gap accelerates smoothing (smaller ), as seen in complete or small-world graphs; sparse or large-diameter graphs slow it (Sun, 2022, Huang et al., 2020).
- Ollivier-Ricci curvature:
On regular graphs with lower-bounded positive curvature , feature differences between neighbors contract geometrically as with : more positive curvature, faster collapse (Nguyen et al., 2022).
- Nonlocal/Algebraic Smoothing:
Proposed nonlocal PDE dynamics, with adaptive energy-dependent propagation, yield algebraic decay rather than exponential, thus fundamentally attenuating over-smoothing (Guan et al., 9 Dec 2025).
- Operator Consistency:
Operator-consistent GNNs (fixed ) always exhibit exponential over-smoothing; operator-inconsistent (e.g., layerwise-varying attention) can avoid it if per-layer stationary distributions do not converge (Zhao et al., 2022).
5. Theoretical and Empirical Validation Across GNN Variants
Empirical studies and theoretical analysis consistently validate the spectral decay law:
- Experiments on Cora, Citeseer, Pubmed, and synthetic benchmarks confirm that Dirichlet energy or high-frequency energy decays linearly on a log-scale, with slope precisely predicted by or related spectral quantities (Guan et al., 7 Dec 2025, Yang et al., 2024, Roth et al., 2023).
- For plain GCN, GAT, SAGE, and similar models, the collapse rate matches with (Roth, 2024, Roth et al., 2023).
- Skip-connections, adaptive aggregation, and DropEdge slow the decay (increasing ), while true algebraic smoothing (nonlocal models) further reduce the rate to (Guan et al., 9 Dec 2025, Huang et al., 2020).
- In practice, over-smoothing in typical GCN architectures emerges within a few (4–8) layers for graphs with moderate spectral gap (Arroyo et al., 15 Feb 2025).
6. Implications, Control Strategies, and Generalizations
The explicit connection between over-smoothing decay rate and graph/model parameters enables effective interventions and broader generalization:
- Slowing over-smoothing:
Decrease spectral gap (e.g., edge dropping/Sparsification), introduce residual or skip connections, design multi-aggregator (sum of Kronecker products) architectures, and employ nonlocal propagation (Roth, 2024, Roth et al., 2023, Guan et al., 9 Dec 2025).
- Avoidance regimes:
Time-inhomogeneous processes or GATs with layerwise-regularized attention can avoid exponential collapse if a uniform gap between per-layer stationary distributions is enforced (Zhao et al., 2022).
- Unified view with vanishing gradients:
The decay rate of over-smoothing is fundamentally linked with gradient vanishing phenomena; when the contraction constant per layer, both feature and gradient norms decay exponentially (Arroyo et al., 15 Feb 2025).
- Hilbert and Banach space generalization:
In dynamical/functional analysis, smoothing by exponentially-stable semigroups yields decay of the form for regularizing operators, reinforcing the universality of the smoothing rate (Wakaiki, 2022).
In summary, the over-smoothing decay rate is a sharp, model-agnostic descriptor of the exponential contraction of GNN feature differences and higher-order graph functionals, directly controlled by the spectral (or geometric) characteristics of the graph and the structure of the propagation operator. This rate interfaces with fundamental limitations in learning, expressivity, and gradient flow in deep GNNs, and drives the design of contemporary anti-over-smoothing techniques across both theory and practice (Roth, 2024, Guan et al., 7 Dec 2025, Sun, 2022, Roth et al., 2023, Guan et al., 9 Dec 2025, Zhao et al., 2022, Yang et al., 2024, Arroyo et al., 15 Feb 2025, Nguyen et al., 2022, Huang et al., 2020, Wakaiki, 2022).