- The paper introduces Graph Diffusion Convolution (GDC), a novel preprocessing method that replaces traditional graph representations with a diffusion-based matrix to capture multi-hop relationships.
- It employs Personalized PageRank and Heat Kernel coefficients to efficiently compute and sparsify the diffusion matrix, mitigating noise from arbitrarily defined edges.
- Empirical results show that GDC consistently improves the accuracy of various graph models, especially in scenarios with noisy data or few labels.
This paper introduces Graph Diffusion Convolution (GDC), a preprocessing technique designed to enhance the performance of various graph learning models, including Graph Neural Networks (GNNs) and traditional graph algorithms like spectral clustering. GDC replaces the standard graph adjacency matrix (A) or its derived transition matrix (T) with a new matrix (S~ or T~) derived from generalized graph diffusion, followed by sparsification. The core idea is that incorporating information from multi-hop neighbors via diffusion can mitigate issues arising from noisy or arbitrarily defined edges often found in real-world graphs, effectively acting as a denoising filter.
Core Concept: Generalized Graph Diffusion
Generalized graph diffusion is defined by the matrix:
S=∑k=0∞θkTk
Where:
- T is a generalized transition matrix (e.g., random walk Trw=D−1A or symmetric Tsym=D−1/2AD−1/2). The paper recommends using the symmetric transition matrix with added self-loops: T~sym=(D~+I)−1/2(A~+I)(D~+I)−1/2.
- θk are weighting coefficients that determine the type of diffusion. The paper focuses on two popular choices with closed-form solutions:
- Personalized PageRank (PPR): θk=α(1−α)k. Corresponds to a random walk with teleport probability α.
- Heat Kernel: θk=e−tk!tk. Corresponds to heat diffusion over time t.
Graph Diffusion Convolution (GDC) Process
GDC is implemented as a plug-and-play preprocessing step:
- Compute Diffusion Matrix (S): Given the graph's adjacency matrix A, calculate the chosen transition matrix T (e.g., T~sym). Then, compute the diffusion matrix S=∑θkTk using selected coefficients θk (e.g., PPR or Heat Kernel). This can often be computed efficiently in closed form for PPR and Heat Kernel, e.g., for PPR with Trw, S=α(I−(1−α)Trw)−1.
- Sparsify Diffusion Matrix (S~): The resulting matrix S is typically dense. To maintain computational efficiency for downstream models, S is sparsified to create S~. Two methods are proposed:
- Top-k: Keep only the k largest entries per column (node). This creates a regular graph structure which can be beneficial for batching.
- Epsilon-threshold (ϵ): Set all entries Sij<ϵ to zero.
The paper notes that sparsification empirically often improves performance, suggesting it removes weak, potentially noisy connections.
- Apply Model: Use the sparsified matrix S~ (or a transition matrix derived from it, like T~rw=D~S~−1S~) as the input graph structure for any existing graph-based algorithm (GCN, GAT, DeepWalk, Spectral Clustering, etc.) without changing the downstream model's architecture.
Implementation Example (Pseudocode)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
|
import numpy as np
from scipy.sparse import csr_matrix, diags
from scipy.sparse.linalg import inv # Or use approximation methods
def compute_transition_matrix(adj, self_loop_weight=1):
""" Computes symmetric transition matrix with self-loops. """
adj_loop = adj + self_loop_weight * diags(np.ones(adj.shape[0]))
D_loop = diags(adj_loop.sum(axis=1).A1)
D_inv_sqrt = diags(1.0 / np.sqrt(D_loop.diagonal()))
T_sym = D_inv_sqrt @ adj_loop @ D_inv_sqrt
return T_sym
def compute_ppr_diffusion(T_sym, alpha, approximation_inv=False):
""" Computes PPR diffusion matrix (dense). """
N = T_sym.shape[0]
I = diags(np.ones(N))
if approximation_inv:
# Use iterative methods or approximations like Andersen et al. 2006
S_inv = I - (1 - alpha) * T_sym
# Approximate inverse of S_inv
S = alpha * approx_inverse(S_inv) # Placeholder for approximation
else:
# Exact but potentially slow/memory intensive for large graphs
S = alpha * inv(I - (1 - alpha) * T_sym)
return S.toarray() # Convert to dense numpy array for now
def sparsify_diffusion(S, method='eps', param=1e-4):
""" Sparsifies the dense diffusion matrix S. """
if method == 'eps':
S[S < param] = 0
S_sparse = csr_matrix(S)
elif method == 'topk':
# Keep top-k entries per column
k = int(param)
rows, cols, vals = [], [], []
for j in range(S.shape[1]): # Iterate over columns
col_data = S[:, j]
top_k_indices = np.argsort(col_data)[-k:]
rows.extend(top_k_indices)
cols.extend([j] * k)
vals.extend(col_data[top_k_indices])
S_sparse = csr_matrix((vals, (rows, cols)), shape=S.shape)
else:
raise ValueError("Unknown sparsification method")
return S_sparse
|
Spectral Analysis Insights
- GDC acts as a low-pass filter spectrally, similar to polynomial filters on the graph Laplacian. It amplifies low-frequency signals (smooth variations, communities) and dampens high-frequency signals (noise, sharp variations).
- The diffusion step transforms eigenvalues λi of T to λ~i=∑θkλik. For PPR, λ~i=1−(1−α)λiα; for Heat Kernel, λ~i=et(λi−1).
- Unlike pure spectral methods, GDC avoids explicit eigendecomposition, preserves spatial locality, and generalizes to unseen graphs (transductive application).
Practical Considerations
- Scalability: Computing the exact dense diffusion matrix S can be infeasible for large graphs. The paper relies on efficient approximation algorithms for PPR and Heat Kernel which achieve linear time/space complexity (O(N)), making GDC practical.
- Sparsification: Choosing the sparsification method (top-k or ϵ-threshold) and its parameter (k or ϵ) is important. Experiments (Fig 3) suggest that aiming for a fixed average degree (e.g., 64 or 128) via top-k often works well across datasets and can outperform using the original graph's sparsity.
- Hyperparameters: The primary hyperparameters are the diffusion type (PPR/Heat Kernel), α or t, and the sparsification parameter. Experiments show optimal α∈[0.05,0.2] and t∈[1,10] are often effective (Fig 5).
- Applicability: GDC works as a preprocessing step. It generates a new graph representation (S~ or T~) that replaces the original one as input to any standard graph algorithm without requiring modifications to the algorithm itself.
- Limitations: GDC relies heavily on the homophily assumption ("birds of a feather flock together"). It may not perform well on heterophilic graphs or tasks like link prediction where preserving the original edge structure is crucial.
Experimental Results Summary
- GDC consistently and significantly improved accuracy across various GNN models (GCN, GAT, JK, GIN, ARMA) and unsupervised methods (DCSBM, Spectral Clustering, DeepWalk, DGI) on multiple benchmark datasets (Cora, Citeseer, PubMed, Coauthor CS, Amazon Comp/Photo) for node classification and clustering tasks.
- The improvement was often more pronounced for models that initially performed poorly and in low-label scenarios.
- Using PPR or Heat Kernel coefficients generally outperformed coefficients learned via methods like AdaDIF.
- The symmetric transition matrix (Tsym) with self-loops generally performed best.
In essence, GDC provides a practical and broadly applicable method to enhance graph-based models by leveraging the smoothing properties of graph diffusion, effectively creating a more robust graph representation before model application. Its main strength lies in its plug-and-play nature and the consistent performance gains observed across diverse tasks and models, provided the underlying graph exhibits homophily.