Diffusion Improves Graph Learning (1911.05485v6)

Published 28 Oct 2019 in cs.SI, cs.AI, cs.LG, and stat.ML

Abstract: Graph convolution is the core of most Graph Neural Networks (GNNs) and usually approximated by message passing between direct (one-hop) neighbors. In this work, we remove the restriction of using only the direct neighbors by introducing a powerful, yet spatially localized graph convolution: Graph diffusion convolution (GDC). GDC leverages generalized graph diffusion, examples of which are the heat kernel and personalized PageRank. It alleviates the problem of noisy and often arbitrarily defined edges in real graphs. We show that GDC is closely related to spectral-based models and thus combines the strengths of both spatial (message passing) and spectral methods. We demonstrate that replacing message passing with graph diffusion convolution consistently leads to significant performance improvements across a wide range of models on both supervised and unsupervised tasks and a variety of datasets. Furthermore, GDC is not limited to GNNs but can trivially be combined with any graph-based model or algorithm (e.g. spectral clustering) without requiring any changes to the latter or affecting its computational complexity. Our implementation is available online.

Citations (632)

View on Semantic Scholar

Summary

The paper introduces Graph Diffusion Convolution (GDC), a novel preprocessing method that replaces traditional graph representations with a diffusion-based matrix to capture multi-hop relationships.
It employs Personalized PageRank and Heat Kernel coefficients to efficiently compute and sparsify the diffusion matrix, mitigating noise from arbitrarily defined edges.
Empirical results show that GDC consistently improves the accuracy of various graph models, especially in scenarios with noisy data or few labels.

This paper introduces Graph Diffusion Convolution (GDC), a preprocessing technique designed to enhance the performance of various graph learning models, including Graph Neural Networks (GNNs) and traditional graph algorithms like spectral clustering. GDC replaces the standard graph adjacency matrix ( $A$ ) or its derived transition matrix ( $T$ ) with a new matrix ( $\tilde{S}$ or $\tilde{T}$ ) derived from generalized graph diffusion, followed by sparsification. The core idea is that incorporating information from multi-hop neighbors via diffusion can mitigate issues arising from noisy or arbitrarily defined edges often found in real-world graphs, effectively acting as a denoising filter.

Core Concept: Generalized Graph Diffusion

Generalized graph diffusion is defined by the matrix:

$S = \sum_{k=0}^{\infty} \theta_k T^k$

Where:

$T$ is a generalized transition matrix (e.g., random walk $T_{rw} = D^{-1}A$ or symmetric $T_{sym} = D^{-1/2} A D^{-1/2}$ ). The paper recommends using the symmetric transition matrix with added self-loops: $\tilde{T}_{sym} = (\tilde{D} + I)^{-1/2} (\tilde{A} + I) (\tilde{D} + I)^{-1/2}$ .
$\theta_k$ $θ_{k}$ are weighting coefficients that determine the type of diffusion. The paper focuses on two popular choices with closed-form solutions:
- Personalized PageRank (PPR): $\theta_k = \alpha (1-\alpha)^k$ . Corresponds to a random walk with teleport probability $\alpha$ .
- Heat Kernel: $\theta_k = e^{-t} \frac{t^k}{k!}$ . Corresponds to heat diffusion over time $t$ .

Graph Diffusion Convolution (GDC) Process

GDC is implemented as a plug-and-play preprocessing step:

Compute Diffusion Matrix (S): Given the graph's adjacency matrix $A$ , calculate the chosen transition matrix $T$ (e.g., $\tilde{T}_{sym}$ ). Then, compute the diffusion matrix $S = \sum \theta_k T^k$ using selected coefficients $\theta_k$ (e.g., PPR or Heat Kernel). This can often be computed efficiently in closed form for PPR and Heat Kernel, e.g., for PPR with $T_{rw}$ , $S = \alpha (I - (1-\alpha)T_{rw})^{-1}$ .
Sparsify Diffusion Matrix ( $\tilde{S}$ ): The resulting matrix $S$ $S$ is typically dense. To maintain computational efficiency for downstream models, $S$ $S$ is sparsified to create $\tilde{S}$ $\tilde{S}$ . Two methods are proposed:
- Top-k: Keep only the $k$ largest entries per column (node). This creates a regular graph structure which can be beneficial for batching.
- Epsilon-threshold ( $\epsilon$ ): Set all entries $S_{ij} < \epsilon$ to zero. The paper notes that sparsification empirically often improves performance, suggesting it removes weak, potentially noisy connections.
Apply Model: Use the sparsified matrix $\tilde{S}$ (or a transition matrix derived from it, like $\tilde{T}_{rw} = \tilde{D}_{\tilde{S}}^{-1} \tilde{S}$ ) as the input graph structure for any existing graph-based algorithm (GCN, GAT, DeepWalk, Spectral Clustering, etc.) without changing the downstream model's architecture.

Implementation Example (Pseudocode)

import numpy as np
from scipy.sparse import csr_matrix, diags
from scipy.sparse.linalg import inv # Or use approximation methods

def compute_transition_matrix(adj, self_loop_weight=1):
  """ Computes symmetric transition matrix with self-loops. """
  adj_loop = adj + self_loop_weight * diags(np.ones(adj.shape[0]))
  D_loop = diags(adj_loop.sum(axis=1).A1)
  D_inv_sqrt = diags(1.0 / np.sqrt(D_loop.diagonal()))
  T_sym = D_inv_sqrt @ adj_loop @ D_inv_sqrt
  return T_sym

def compute_ppr_diffusion(T_sym, alpha, approximation_inv=False):
  """ Computes PPR diffusion matrix (dense). """
  N = T_sym.shape[0]
  I = diags(np.ones(N))
  if approximation_inv:
     # Use iterative methods or approximations like Andersen et al. 2006
     S_inv = I - (1 - alpha) * T_sym
     # Approximate inverse of S_inv
     S = alpha * approx_inverse(S_inv) # Placeholder for approximation
  else:
     # Exact but potentially slow/memory intensive for large graphs
     S = alpha * inv(I - (1 - alpha) * T_sym)
  return S.toarray() # Convert to dense numpy array for now

def sparsify_diffusion(S, method='eps', param=1e-4):
  """ Sparsifies the dense diffusion matrix S. """
  if method == 'eps':
     S[S < param] = 0
     S_sparse = csr_matrix(S)
  elif method == 'topk':
     # Keep top-k entries per column
     k = int(param)
     rows, cols, vals = [], [], []
     for j in range(S.shape[1]): # Iterate over columns
        col_data = S[:, j]
        top_k_indices = np.argsort(col_data)[-k:]
        rows.extend(top_k_indices)
        cols.extend([j] * k)
        vals.extend(col_data[top_k_indices])
     S_sparse = csr_matrix((vals, (rows, cols)), shape=S.shape)
  else:
     raise ValueError("Unknown sparsification method")

  return S_sparse

Spectral Analysis Insights

GDC acts as a low-pass filter spectrally, similar to polynomial filters on the graph Laplacian. It amplifies low-frequency signals (smooth variations, communities) and dampens high-frequency signals (noise, sharp variations).
The diffusion step transforms eigenvalues $\lambda_i$ of $T$ to $\tilde{\lambda}_i = \sum \theta_k \lambda_i^k$ . For PPR, $\tilde{\lambda}_i = \frac{\alpha}{1 - (1-\alpha)\lambda_i}$ ; for Heat Kernel, $\tilde{\lambda}_i = e^{t(\lambda_i-1)}$ .
Unlike pure spectral methods, GDC avoids explicit eigendecomposition, preserves spatial locality, and generalizes to unseen graphs (transductive application).

Practical Considerations

Scalability: Computing the exact dense diffusion matrix $S$ can be infeasible for large graphs. The paper relies on efficient approximation algorithms for PPR and Heat Kernel which achieve linear time/space complexity ( $\mathcal{O}(N)$ ), making GDC practical.
Sparsification: Choosing the sparsification method (top-k or $\epsilon$ -threshold) and its parameter ( $k$ or $\epsilon$ ) is important. Experiments (Fig 3) suggest that aiming for a fixed average degree (e.g., 64 or 128) via top-k often works well across datasets and can outperform using the original graph's sparsity.
Hyperparameters: The primary hyperparameters are the diffusion type (PPR/Heat Kernel), $\alpha$ or $t$ , and the sparsification parameter. Experiments show optimal $\alpha \in [0.05, 0.2]$ and $t \in [1, 10]$ are often effective (Fig 5).
Applicability: GDC works as a preprocessing step. It generates a new graph representation ( $\tilde{S}$ or $\tilde{T}$ ) that replaces the original one as input to any standard graph algorithm without requiring modifications to the algorithm itself.
Limitations: GDC relies heavily on the homophily assumption ("birds of a feather flock together"). It may not perform well on heterophilic graphs or tasks like link prediction where preserving the original edge structure is crucial.

Experimental Results Summary

GDC consistently and significantly improved accuracy across various GNN models (GCN, GAT, JK, GIN, ARMA) and unsupervised methods (DCSBM, Spectral Clustering, DeepWalk, DGI) on multiple benchmark datasets (Cora, Citeseer, PubMed, Coauthor CS, Amazon Comp/Photo) for node classification and clustering tasks.
The improvement was often more pronounced for models that initially performed poorly and in low-label scenarios.
Using PPR or Heat Kernel coefficients generally outperformed coefficients learned via methods like AdaDIF.
The symmetric transition matrix ( $T_{sym}$ ) with self-loops generally performed best.

In essence, GDC provides a practical and broadly applicable method to enhance graph-based models by leveraging the smoothing properties of graph diffusion, effectively creating a more robust graph representation before model application. Its main strength lies in its plug-and-play nature and the consistent performance gains observed across diverse tasks and models, provided the underlying graph exhibits homophily.

PDF Markdown

Related Papers

Adaptive Graph Diffusion Networks (2020)
Spectral GNN via Two-dimensional (2-D) Graph Convolution (2024)
QDC: Quantum Diffusion Convolution Kernels on Graphs (2023)
Graph Networks with Spectral Message Passing (2020)
Scalable Graph Compressed Convolutions (2024)

GitHub

GitHub - pyg-team/pytorch_geometric: Graph Neural Network Library for PyTorch (20,398 stars)

Tweets

https://twitter.com/CodeTrendr/status/1896666982201868595

https://twitter.com/lux/status/1930094081495183535

HackerNews

PyTorch Geometric – Graph Neural Network Library for PyTorch (3 points, 1 comment)