Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Graph Wavelet Neural Network (1904.07785v1)

Published 12 Apr 2019 in cs.LG and stat.ML

Abstract: We present graph wavelet neural network (GWNN), a novel graph convolutional neural network (CNN), leveraging graph wavelet transform to address the shortcomings of previous spectral graph CNN methods that depend on graph Fourier transform. Different from graph Fourier transform, graph wavelet transform can be obtained via a fast algorithm without requiring matrix eigendecomposition with high computational cost. Moreover, graph wavelets are sparse and localized in vertex domain, offering high efficiency and good interpretability for graph convolution. The proposed GWNN significantly outperforms previous spectral graph CNNs in the task of graph-based semi-supervised classification on three benchmark datasets: Cora, Citeseer and Pubmed.

Citations (307)

Summary

  • The paper introduces GWNN, replacing the traditional graph Fourier transform with graph wavelet transform to improve computational efficiency and local feature representation.
  • It detaches feature transformation from convolution, reducing parameter complexity from O(n·p·q) to O(n+p·q) and mitigating overfitting in semi-supervised scenarios.
  • Experiments on Cora, Citeseer, and Pubmed demonstrate that GWNN outperforms existing spectral methods with sparse, interpretable, and scalable graph convolutions.

This paper, "Graph Wavelet Neural Network" (Graph Wavelet Neural Network, 2019), introduces a novel graph convolutional neural network (CNN) model called GWNN, which leverages graph wavelet transform (GWT) instead of the traditional graph Fourier transform (GFT) used in many spectral graph CNNs. The core motivation is to address the limitations of GFT-based methods, such as high computational cost due to eigendecomposition, lack of sparsity, and non-locality of the resulting convolution.

Limitations of Graph Fourier Transform for CNNs

Spectral graph CNNs traditionally define convolution using GFT based on the eigenvectors of the graph Laplacian matrix. While this allows defining filters in the spectral domain, it suffers from several practical drawbacks:

  1. High Computational Cost: Computing the eigendecomposition of the graph Laplacian requires O(n3)O(n^3) time complexity, where nn is the number of nodes, which is prohibitive for large graphs.
  2. Inefficiency: The eigenvectors of the Laplacian are generally dense, making GFT and inverse GFT operations computationally expensive, O(n2)O(n^2).
  3. Non-locality: Graph convolution defined via GFT is not localized in the vertex domain, meaning the influence on a node's signal is not restricted to its immediate neighborhood.

Previous works like ChebyNet (The first moment of azimuthal anisotropy in nuclear collisions from AGS to LHC energies, 2016) and GCN (Semi-Supervised Classification with Graph Convolutional Networks, 2016) addressed the computational cost and tried to induce locality by approximating the spectral filter using polynomial expansions of the Laplacian, avoiding eigendecomposition. However, this approximation limits the flexibility of the filter.

Graph Wavelet Transform in GWNN

GWNN proposes using graph wavelets as a new set of bases for spectral representation. Graph wavelets ψs\psi_s are defined via a scaling matrix GsG_s applied to the Laplacian eigenvectors UU: ψs=UGs\psi_s = U G_s^\top. The scaling matrix GsG_s is diagonal with entries g(sλi)g(s\lambda_i), where gg is a function (e.g., heat kernel eλise^{\lambda_i s}) and ss is a scaling parameter.

Graph wavelet transform offers several advantages for graph convolution:

  1. Efficiency: Graph wavelets ψs\psi_s and their inverses ψs1\psi_s^{-1} can be computed efficiently using fast algorithms, such as approximations via Chebyshev polynomials (Ultralong-range polyatomic Rydberg molecules formed by a polar perturber, 2011), which have a computational complexity of O(m×E)O(m \times |E|), where mm is the order of the polynomial and E|E| is the number of edges. This avoids the O(n3)O(n^3) eigendecomposition.
  2. Sparsity: For typical sparse real-world graphs, the matrices ψs\psi_s and ψs1\psi_s^{-1} are sparse. This makes the wavelet transform x^=ψs1x\hat{x} = \psi_s^{-1} x and inverse transform x=ψsx^x = \psi_s \hat{x} operations much more efficient than dense matrix-vector multiplications involved in GFT (O(n×non-zeros(ψs1))O(n \times \text{non-zeros}(\psi_s^{-1})) vs O(n2)O(n^2)). Experiments show ψs1\psi_s^{-1} can be significantly sparser than UU^\top.
  3. Locality and Interpretability: Graph wavelets are localized in the vertex domain. Each wavelet ψsi\psi_{si} is centered at node ii and represents signal diffusion away from it. This intrinsic locality of wavelets translates to localized graph convolution defined by the wavelet transform (Equation 3: xGh=ψs((ψs1x)(ψs1h))x *_\mathcal{G} h = \psi_s ((\psi_s^{-1} x) \odot (\psi_s^{-1} h))). The locality also contributes to better interpretability, as shown by analyzing active wavelets for different features.
  4. Flexible Neighborhood: The scaling parameter s allows for adjusting the range of influence of wavelets, effectively controlling the size of the neighborhood considered in the convolution in a continuous manner.

Graph Wavelet Neural Network Architecture

A GWNN layer takes an input feature tensor XmRn×pX^m \in \mathbb{R}^{n \times p} and transforms it into an output tensor Xm+1Rn×qX^{m+1} \in \mathbb{R}^{n \times q}. The original layer definition (Equation 4) involves a spectral filter matrix for each pair of input and output features, leading to a large number of parameters O(n×p×q)O(n \times p \times q).

To address the high parameter complexity, especially crucial for semi-supervised learning with limited labels, the paper introduces a key implementation technique: detaching feature transformation from graph convolution. Each layer is split into two stages:

  1. Feature Transformation: A standard linear transformation is applied to the input features: Xm=XmWX^{m'} = X^m W, where WRp×qW \in \mathbb{R}^{p \times q} is the weight matrix (Equation 8).
  2. Graph Convolution: The transformed features XmX^{m'} are convolved using the graph wavelet transform: Xm+1=h(ψsΣmψs1Xm)X^{m+1} = h(\psi_s \Sigma^m \psi_s^{-1} X^{m'}), where Σm\Sigma^m is a diagonal matrix representing the learned convolution kernel in the wavelet domain and hh is a non-linear activation (Equation 9, corrected from paper's notation which implies ψs1\psi_s^{-1} acts on XmX^{m'} first).

This separation reduces the parameter complexity per layer to O(p×q)O(p \times q) for the feature transformation weights WW and O(n)O(n) for the diagonal spectral kernel Σm\Sigma^m, totaling O(n+p×q)O(n + p \times q). This is significantly lower than O(n×p×q)O(n \times p \times q) and competitive with methods like GCN O(p×q)O(p \times q) but with the added O(n)O(n) for the diagonal kernel.

For semi-supervised node classification, the paper uses a two-layer GWNN:

  • Layer 1: ReLU activation (Equation 5)
  • Layer 2: Softmax activation for class probabilities (Equation 6)

The model is trained using cross-entropy loss on the labeled nodes.

Implementation Considerations and Experiments

  • Fast Wavelet Computation: The practical implementation relies on the Chebyshev polynomial approximation of ψs\psi_s and ψs1\psi_s^{-1} to avoid eigendecomposition (Appendix D).
  • Sparsity Threshold: For computational efficiency, elements in ψs\psi_s and ψs1\psi_s^{-1} smaller than a threshold tt are set to zero.
  • Hyperparameter Tuning: The scale parameter s and sparsity threshold t are tuned using a validation set. The paper observes that accuracy generally increases with s up to a point, then decreases, while t has less influence (Appendix B).
  • Datasets: Experiments are conducted on standard citation network datasets: Cora, Citeseer, and Pubmed, using the same semi-supervised split as GCN (20 labels per class for training).
  • Baselines: GWNN is compared against traditional methods, spectral methods (Spectral CNN, ChebyNet, GCN), and spatial methods (MoNet).
  • Results:
    • Detaching feature transformation is shown to be effective, especially on datasets with low label rates like Pubmed, improving accuracy and significantly reducing parameters (Table 2).
    • GWNN consistently outperforms Spectral CNN, ChebyNet, GCN, and MoNet on node classification accuracy across all three datasets (Table 3).
    • Sparsity analysis confirms that wavelet transform matrices and projected signals are much sparser than their Fourier counterparts on the Cora dataset (Table 4).
    • Interpretability analysis demonstrates how the localized nature of wavelets allows interpreting projected signals as correlations between features (words) and nodes (documents), with top-activating nodes for a specific word concentrating in relevant parts of the graph (Figure 3).

Practical Implications and Applications

The GWNN model provides a practical approach for applying deep learning to graph-structured data, particularly in semi-supervised settings where labeled data is scarce. Its key strengths for implementation are:

  1. Scalability: The use of efficient wavelet computation (via Chebyshev approximation) and the detached architecture make GWNN applicable to larger graphs compared to standard spectral methods requiring full eigendecomposition. The O(mE)O(m|E|) complexity for wavelet basis computation and efficient sparse matrix multiplications are key here.
  2. Improved Performance: The experiments demonstrate state-of-the-art performance on benchmark node classification tasks, suggesting that graph wavelets provide a more suitable basis for defining graph convolution than Fourier bases or their polynomial approximations.
  3. Reduced Overfitting: The significantly reduced parameter count due to the detached architecture helps mitigate overfitting, which is crucial in semi-supervised scenarios with limited labels.
  4. Interpretability: The localized nature of wavelets offers insights into how features relate to nodes and how information propagates through the network during convolution. This can be valuable for debugging and understanding model predictions.

GWNN can be applied to various graph-based tasks beyond semi-supervised classification, such as graph regression, link prediction, and graph representation learning, especially when locality, interpretability, and efficiency on potentially large, sparse graphs are important considerations. However, like other spectral methods, GWNN is inherently transductive (tied to a fixed graph structure defined by the precomputed wavelet bases), although extensions for inductive settings might be possible by adapting the wavelet computation or incorporating sampling strategies. The memory requirement for storing the potentially large ψs\psi_s and ψs1\psi_s^{-1} matrices (even if sparse) could still be a factor for extremely large graphs, prompting the need for strategies like only computing/storing specific wavelets or using on-the-fly approximation methods.