Papers
Topics
Authors
Recent
Search
2000 character limit reached

HOG-Diff: Higher-Order Guided Diffusion for Graph Generation

Published 6 Feb 2025 in cs.LG, cs.AI, cs.SI, and physics.soc-ph | (2502.04308v1)

Abstract: Graph generation is a critical yet challenging task as empirical analyses require a deep understanding of complex, non-Euclidean structures. Although diffusion models have recently made significant achievements in graph generation, these models typically adapt from the frameworks designed for image generation, making them ill-suited for capturing the topological properties of graphs. In this work, we propose a novel Higher-order Guided Diffusion (HOG-Diff) model that follows a coarse-to-fine generation curriculum and is guided by higher-order information, enabling the progressive generation of plausible graphs with inherent topological structures. We further prove that our model exhibits a stronger theoretical guarantee than classical diffusion frameworks. Extensive experiments on both molecular and generic graph generation tasks demonstrate that our method consistently outperforms or remains competitive with state-of-the-art baselines. Our code is available at https://github.com/Yiminghh/HOG-Diff.

Summary

  • The paper introduces HOG-Diff, which employs a coarse-to-fine diffusion curriculum guided by higher-order topology to generate structurally sound graphs.
  • It provides theoretical evidence of faster convergence in score matching and establishes tighter reconstruction error bounds than conventional diffusion models.
  • Experimental evaluations on molecular and generic graph datasets demonstrate that HOG-Diff achieves state-of-the-art performance using varied topological guides.

The paper introduces a novel Higher-Order Guided Diffusion (HOG-Diff) model for graph generation, addressing limitations in existing methods that struggle to capture the topological properties of graphs, particularly higher-order structures. HOG-Diff employs a coarse-to-fine generation curriculum guided by higher-order information, enabling the progressive generation of plausible graphs with inherent topological structures. The authors theoretically prove that HOG-Diff exhibits a stronger theoretical guarantee than classical diffusion frameworks.

The paper details the following contributions:

  • A coarse-to-fine graph generation curriculum guided by higher-order topological information using the Ornstein-Uhlenbeck (OU) diffusion bridge.
  • Theoretical analysis revealing that HOG-Diff achieves faster convergence during score-matching and a sharper reconstruction error bound compared to classical methods.
  • Experimental evaluations demonstrating that HOG-Diff achieves state-of-the-art graph generation performance across various datasets.

Graphs are represented as G(V,E,X)\bm{G} \triangleq (\bm{V},\bm{E}, \bm{X}), where:

  • V\bm{V} is the node set.
  • EV×V\bm{E}\subseteq \bm{V}\times\bm{V} represents the edges.
  • X\bm{X} is the node feature matrix.

Higher-order networks, such as hypergraphs, simplicial complexes, and cell complexes, capture multi-way interactions among entities. Cell complexes, fundamental in algebraic topology, provide a flexible generalization of pairwise graphs. A regular cell complex is defined as a topological space S\mathcal{S} partitioned into subspaces (cells) {xα}αPS\{x_\alpha\}_{\alpha\in P_\mathcal{S}}, where PSP_\mathcal{S} is an index set, satisfying specific conditions related to neighborhoods, boundaries, homeomorphisms to Rnα\mathbb{R}^{n_\alpha}, and regularity.

Score-based diffusion models generate samples from an unknown target data distribution p(x0)p(\mathbf{x}_0) by progressively corrupting data with noise and training a neural network to reverse this process. The time-dependent forward process is described by the stochastic differential equation (SDE):

dxt=ft(xt)dt+gtdwt\mathrm{d}\mathbf{x}_t=\mathbf{f}_t\left(\mathbf{x}_t\right)\mathrm{d}t+g_t\mathrm{d}\mathbf{w}_t,

where:

  • xt\mathbf{x}_t is the state at time tt.
  • ft:RnRn\mathbf{f}_t: \mathbb{R}^n \to \mathbb{R}^n is a vector-valued drift function.
  • gt:[0,T]Rg_t: [0,T]\to \mathbb{R} is a scalar diffusion coefficient.
  • wt\mathbf{w}_t is a Wiener process.

The reverse SDE is:

$\mathrm{d} \mathbf{x}_t=[\mathbf{f}_t(\mathbf{x}_t)-g_t^2 \nabla_{\mathbf{x}_t} \log p_t(\mathbf{x}_t)]\diff{t} + g_t \mathrm{d} \bar{\mathbf{w}}$,

where:

  • pt()p_t(\cdot) is the probability density function of xt\mathbf{x}_t.
  • wˉ\bar{\mathbf{w}} is a reverse-time Wiener process.
  • xtlogpt(xt)\nabla_{\mathbf{x}_t} \log p_t(\mathbf{x}_t) is the score function.

The score function is parameterized as a neural network $\bm{s}_{\bm{\theta}(\mathbf{x}_t,t)$ and trained using the score-matching technique with the loss function:

$\ell(\bm{\theta}) \triangleq \mathbb{E}_{t, \mathbf{x}_t}\left[\omega(t)\left\|\bm{s}_{\bm{\theta}(\mathbf{x}_t,t) - \nabla_{\mathbf{x}_t} \log p_t(\mathbf{x}_t)\right\|^2\right] \propto \mathbb{E}_{t,\mathbf{x}_0,\mathbf{x}_t }\left[ \omega(t) \left \Vert \bm{s}_{\bm{\theta}(\mathbf{x}_t, t) - \nabla_{\mathbf{x}_t} \log p_t (\mathbf{x}_t|\mathbf{x}_0)\right\Vert^2\right]$,

where ω(t)\omega(t) is a weighting function.

Doob’s hh-transform modifies stochastic processes to satisfy specific terminal conditions by introducing an hh-function into the drift term of an SDE. Given the SDE:

dxt=ft(xt)dt+gtdwt\mathrm{d}\mathbf{x}_t=\mathbf{f}_t\left(\mathbf{x}_t\right)\mathrm{d}t+g_t\mathrm{d}\mathbf{w}_t,

Doob’s hh-transform alters the SDE to:

dxt=[ft(xt)+gt2h(xt,t,xT,T)]dt+gtdwt\mathrm{d}\mathbf{x}_t=[\mathbf{f}_t\left(\mathbf{x}_t\right)+g_t^2 \bm{h}(\mathbf{x}_t,t,\mathbf{x}_T,T)]\mathrm{d}t+g_t\mathrm{d}\mathbf{w}_t,

where h(xt,t,xT,T)=xtlogp(xTxt)\bm{h}(\mathbf{x}_t,t,\mathbf{x}_T,T)=\nabla_{\mathbf{x}_t} \log p(\mathbf{x}_T| \mathbf{x}_t).

To implement the coarse-to-fine generation curriculum, the paper introduces cell complex filtering (CCF). Given a graph G=(V,E)\bm{G} = (\bm{V},\bm{E}) and its associated cell complex S\mathcal{S}, the CCF operation produces a filtered graph G=(V,E)\bm{G}^\prime = (\bm{V}^\prime,\bm{E}^\prime) where V={vV  xαS:vxα}\bm{V}^{\prime} = \{ v \in \bm{V} \mid \exists\;x_\alpha \in \mathcal{S} : v \in x_\alpha \} and E={(u,v)E  xαS:u,vxα}\bm{E}^{\prime} = \{ (u, v) \in \bm{E} \mid \exists\;x_\alpha \in \mathcal{S} : u,v \in x_\alpha\}.

The forward and reverse diffusion processes in HOG-Diff are divided into KK hierarchical time windows, denoted as {[τk1,τk]}k=1K\{[\tau_{k-1},\tau_k]\}_{k=1}^K, where 0=τ0<<τk1<τk<<τK=T0 = \tau_0 < \cdots < \tau_{k-1}< \tau_k < \cdots < \tau_K = T. The generation process factorizes the joint distribution of the final graph G0\bm{G}_0 into a product of conditional distributions across these time windows:

$p(\bm{G}_0)=p(\bm{G}_0|\bm{G}_{\tau_1})p(\bm{G}_{\tau_1}|\bm{G}_{\tau_2}) \cdots p(\bm{G}_{\tau_{K-1}|\bm{G}_{T})$.

During each time window [τk1,τk][\tau_{k-1}, \tau_k], the evolution of the graph is governed by the forward SDE:

dGt(k)=fk,t(Gt(k))dt+gk,tdWt,t[τk1,τk]\mathrm{d}\bm{G}_t^{(k)}=\mathbf{f}_{k,t}(\bm{G}_t^{(k)})\mathrm{d}t+g_{k,t}\mathrm{d}\bm{W}_t, t \in [\tau_{k-1}, \tau_k].

The guided diffusion is based on the generalized OU process governed by the SDE:

Q:dGt=θt(μGt)dt+gt(Gt)dWt\mathbb{Q}: \mathrm{d} \bm{G}_t = \theta_t(\bm{\mu} -\bm{G}_t)\mathrm{d}t + g_t(\bm{G}_t)\mathrm{d}\bm{W}_t,

where:

  • μ=Gτk\bm{\mu}=\bm{G}_{\tau_k} is the target terminal state.
  • θt\theta_t is a scalar drift coefficient.
  • gtg_t is the diffusion coefficient.

Also, gt2/θt=2σ2g_t^2/\theta_t = 2\sigma^2, where σ2\sigma^2 is a given constant scalar.

The transition probability admits a closed-form solution:

$p(\bm{G}_{t}\mid \bm{G}_s)=\mathcal{N}(\mathbf{m}_{s:t},v_{s:t}^{2}\bm{I}) = \mathcal{N}(\bm{\mu}+\left(\bm{G}_s-\bm{\mu}\right)e^{-\bar{\theta}_{s:t}, \sigma^2 (1-e^{-2\bar{\theta}_{s:t})\bm{I})$,

where θˉs:t=stθzdz\bar{\theta}_{s:t}=\int_s^t\theta_zdz.

When applied to the generalized OU process, Doob’s hh-transform drives the diffusion process toward a Dirac distribution centered at Gτk\bm{G}_{\tau_k}.

The conditional marginal distribution p(GtGτk)p(\bm{G}_t\mid\bm{G}_{\tau_k}) evolves according to the SDE:

$\mathrm{d}\bm{G}_t = \theta_t \left( 1 + \frac{2}{e^{2\bar{\theta}_{t:\tau_k}-1} \right)(\bm{G}_{\tau_k} - \bm{G}_t) \mathrm{d}t + g_{k,t} \mathrm{d}\bm{W}_t$.

Generating graph adjacency matrices presents challenges due to the non-uniqueness of graph representations, Pareto distribution and sparsity, and quadratic scaling with the number of nodes. To address these, the paper introduces noise in the eigenvalue domain of the graph Laplacian matrix L=DA\bm{L}=\bm{D}-\bm{A}, instead of the adjacency matrix A\bm{A}, where D\bm{D} denotes the diagonal degree matrix.

As a symmetric positive semi-definite matrix, the graph Laplacian can be diagonalized as L=UΛU\bm{L} = \bm{U} \bm{\Lambda} \bm{U}^\top, where:

  • U=[u1,,un]\bm{U} = [\bm{u}_1,\cdots,\bm{u}_n] is the orthogonal matrix comprising the eigenvectors.
  • Λ=diag(λ1,,λn)\bm{\Lambda} = \operatorname{diag}(\lambda_1,\cdots,\lambda_n) is the diagonal matrix holding the corresponding eigenvalues.

Consequently, the reverse-time SDE is split into two parts that share drift and diffusion coefficients:

$\left\{ \begin{aligned} \mathrm{d}\bm{X}_t= &\left[\mathbf{f}_{k,t}(\bm{X}_t) -g_{k,t}^2 \nabla_{\bm{X} \log p_t(\bm{G}_t | \bm{G}_{\tau_k}) \right]\mathrm{d}\bar{t} +g_{k,t}\mathrm{d}\bar{\bm{W}_{t}^1 \ \mathrm{d}\bm{\Lambda}_t= &\left[\mathbf{f}_{k,t}(\bm{\Lambda}_t) - g_{k,t}^2 \nabla_{\bm{\Lambda} \log p_t(\bm{G}_t | \bm{G}_{\tau_k})\right]\mathrm{d}\bar{t} +g_{k,t}\mathrm{d}\bar{\bm{W}_{t}^2 \end{aligned} \right..$

To approximate the score functions, a neural network $\bm{s}^{(k)}_{\bm{\theta}(\bm{G}_t, \bm{G}_{\tau_k},t)$ is employed, composed of a node ($\bm{s}^{(k)}_{\bm{\theta},\bm{X}(\bm{G}_t, \bm{G}_{\tau_k},t)$) and a spectrum ($\bm{s}^{(k)}_{\bm{\theta},\bm{\Lambda}(\bm{G}_t, \bm{G}_{\tau_k},t)$) output. The model is optimized by minimizing the loss function:

$\ell^{(k)}(\bm{\theta})=\mathbb{E}_{t,\bm{G}_t,\bm{G}_{\tau_{k-1},\bm{G}_{\tau_k} \{\omega(t) [c_1\|\bm{s}^{(k)}_{\bm{\theta},\bm{X} - \nabla_{\bm{X} \log p_t(\bm{G}_t | \bm{G}_{\tau_k})\|_2^2 +c_2 ||\bm{s}^{(k)}_{\bm{\theta},\bm{\Lambda} - \nabla_{\bm{\Lambda} \log p_t(\bm{G}_t | \bm{G}_{\tau_k})||_2^2]\},$

where ω(t)\omega(t) is a positive weighting function, and c1,c2c_1, c_2 control the relative importance of vertices and spectrum.

The paper provides supportive theoretical evidence for the efficacy of HOG-Diff, demonstrating that the framework achieves faster convergence in score-matching and tighter reconstruction error bounds compared to standard graph diffusion works. Suppose the loss function (k)(θ)\ell^{(k)}(\bm{\theta}) in the aforementioned equation is β\beta-smooth and satisfies the μ\mu-PL condition in the ball B(θ0,R)B\left(\boldsymbol{\theta}_0, R\right). Then, the expected loss at the ii-th iteration of the training process satisfies:

E[(k)(θi)](1bμ2βN(βN2+μ(b1)))i(k)(θ0)\mathbb{E}\left[\ell^{(k)}(\bm{\theta}_i)\right] \leq \left(1-\frac{b\mu^2}{\beta N(\beta N^2+\mu(b-1))}\right)^i \ell^{(k)}\left(\bm{\theta}_0\right),

where:

  • NN denotes the size of the training dataset.
  • bb is the mini-batch size.

The paper states that $\beta_{\text{HOG-Diff}\leq \beta_{\text{classical}$, implying that the distribution learned by the proposed framework converges to the target distribution faster than classical generative models.

The expected reconstruction error at each generation process is defined as $\mathcal{E}(t)=\mathbb{E}{\bar{\bm{G}_t-\widehat{\bm{G}_t}^2$, where $\bar{\bm{G}_t$ represents the data reconstructed using the ground truth score logpt()\nabla \log p_t(\cdot) and $\widehat{\bm{G}_t$ denotes the data reconstructed with the learned score function $\bm{s}_{\bm{\theta}$. Under appropriate Lipschitz and boundedness assumptions, the reconstruction error of HOG-Diff satisfies the bound:

$\mathcal{E}(0) \leq \alpha(0)\exp{\int_0^{\tau_1} \gamma(s) } \diff{s}$,

where:

  • $\alpha(0)=C^2 \ell^{(1)} (\bm{\theta}) \int_0^{\tau_1} g_{1,s}^4 \diff{s} + C \mathcal{E}(\tau_1) \int_0^{\tau_1} h_{1,s}^2 \diff{s}$.
  • $\gamma(s) = C^2 g_{1,s}^4 \|\bm{s}_{\bm{\theta}(\cdot,s)\|_{\mathrm{lip}^2 + C \|h_{1,s}\|_{\mathrm{lip}^2$.
  • $h_{1,s} = \theta_s \left(1 + \frac{2}{e^{2\bar{\theta}_{s:\tau_1}-1}\right)$.

The reconstruction error bound of HOG-Diff is sharper than classical graph generation models.

The denoising network $\bm{s}^{(k)}_{\bm{\theta}(\bm{G}_t, \bm{G}_{\tau_k},t)$ estimates score functions. The network comprises a graph convolution network (GCN) for local feature aggregation and a graph transformer network (ATTN) for global information extraction. The outputs of these modules are fused with time information through a Feature-wise Linear Modulation (FiLM) layer. The resulting representations are concatenated to form a unified hidden embedding, which is processed through separate multilayer perceptrons (MLPs) to produce predictions for $\nabla_{\bm{X} \log p(\bm{G}_t|\bm{G}_{\tau_k})$ and $\nabla_{\bm{\Lambda} \log p(\bm{G}_t|\bm{G}_{\tau_k})$, respectively.

The paper experimentally validates HOG-Diff against state-of-the-art baselines for both molecular and generic graph generation. For molecular generation, evaluations are conducted on QM9 and ZINC250k datasets, using metrics such as Neighborhood Subgraph Pairwise Distance Kernel (NSPDK) MMD, Frechet ChemNet Distance (FCD), Validity (Val.), Validity without correction (Val. w/o corr.), Uniqueness (Uni.), and Novelty (Nov.).

For generic graph generation, the model is evaluated over three common generic graph datasets: Community-small, Ego-small, and Enzymes. The paper employs the maximum mean discrepancy (MMD) to quantify the distribution differences across key graph statistics, including degree (Deg.), clustering coefficient (Clus.), and the number of occurrences of orbits with 4 nodes (Orbit).

Ablation studies are conducted using different types of topological information as guides: structures derived from 2-cell filtering (Cell), peripheral structures obtained by removing cell components (Peripheral), and Gaussian random noise (Noise). Results show that cell structures are more effective in guiding the generation.

The paper concludes by emphasizing HOG-Diff's ability to generate high-quality samples and its potential for improving interpretability in graph generation.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.

GitHub

Tweets

Sign up for free to view the 2 tweets with 279 likes about this paper.