- The paper introduces HOG-Diff, which employs a coarse-to-fine diffusion curriculum guided by higher-order topology to generate structurally sound graphs.
- It provides theoretical evidence of faster convergence in score matching and establishes tighter reconstruction error bounds than conventional diffusion models.
- Experimental evaluations on molecular and generic graph datasets demonstrate that HOG-Diff achieves state-of-the-art performance using varied topological guides.
The paper introduces a novel Higher-Order Guided Diffusion (HOG-Diff) model for graph generation, addressing limitations in existing methods that struggle to capture the topological properties of graphs, particularly higher-order structures. HOG-Diff employs a coarse-to-fine generation curriculum guided by higher-order information, enabling the progressive generation of plausible graphs with inherent topological structures. The authors theoretically prove that HOG-Diff exhibits a stronger theoretical guarantee than classical diffusion frameworks.
The paper details the following contributions:
- A coarse-to-fine graph generation curriculum guided by higher-order topological information using the Ornstein-Uhlenbeck (OU) diffusion bridge.
- Theoretical analysis revealing that HOG-Diff achieves faster convergence during score-matching and a sharper reconstruction error bound compared to classical methods.
- Experimental evaluations demonstrating that HOG-Diff achieves state-of-the-art graph generation performance across various datasets.
Graphs are represented as G≜(V,E,X), where:
- V is the node set.
- E⊆V×V represents the edges.
- X is the node feature matrix.
Higher-order networks, such as hypergraphs, simplicial complexes, and cell complexes, capture multi-way interactions among entities. Cell complexes, fundamental in algebraic topology, provide a flexible generalization of pairwise graphs. A regular cell complex is defined as a topological space S partitioned into subspaces (cells) {xα}α∈PS, where PS is an index set, satisfying specific conditions related to neighborhoods, boundaries, homeomorphisms to Rnα, and regularity.
Score-based diffusion models generate samples from an unknown target data distribution p(x0) by progressively corrupting data with noise and training a neural network to reverse this process. The time-dependent forward process is described by the stochastic differential equation (SDE):
dxt=ft(xt)dt+gtdwt,
where:
- xt is the state at time t.
- ft:Rn→Rn is a vector-valued drift function.
- gt:[0,T]→R is a scalar diffusion coefficient.
- wt is a Wiener process.
The reverse SDE is:
$\mathrm{d} \mathbf{x}_t=[\mathbf{f}_t(\mathbf{x}_t)-g_t^2 \nabla_{\mathbf{x}_t} \log p_t(\mathbf{x}_t)]\diff{t} + g_t \mathrm{d} \bar{\mathbf{w}}$,
where:
- pt(⋅) is the probability density function of xt.
- wˉ is a reverse-time Wiener process.
- ∇xtlogpt(xt) is the score function.
The score function is parameterized as a neural network $\bm{s}_{\bm{\theta}(\mathbf{x}_t,t)$ and trained using the score-matching technique with the loss function:
$\ell(\bm{\theta}) \triangleq \mathbb{E}_{t, \mathbf{x}_t}\left[\omega(t)\left\|\bm{s}_{\bm{\theta}(\mathbf{x}_t,t) - \nabla_{\mathbf{x}_t} \log p_t(\mathbf{x}_t)\right\|^2\right] \propto \mathbb{E}_{t,\mathbf{x}_0,\mathbf{x}_t }\left[ \omega(t) \left \Vert \bm{s}_{\bm{\theta}(\mathbf{x}_t, t) - \nabla_{\mathbf{x}_t} \log p_t (\mathbf{x}_t|\mathbf{x}_0)\right\Vert^2\right]$,
where ω(t) is a weighting function.
Doob’s h-transform modifies stochastic processes to satisfy specific terminal conditions by introducing an h-function into the drift term of an SDE. Given the SDE:
dxt=ft(xt)dt+gtdwt,
Doob’s h-transform alters the SDE to:
dxt=[ft(xt)+gt2h(xt,t,xT,T)]dt+gtdwt,
where h(xt,t,xT,T)=∇xtlogp(xT∣xt).
To implement the coarse-to-fine generation curriculum, the paper introduces cell complex filtering (CCF). Given a graph G=(V,E) and its associated cell complex S, the CCF operation produces a filtered graph G′=(V′,E′) where V′={v∈V∣∃xα∈S:v∈xα} and E′={(u,v)∈E∣∃xα∈S:u,v∈xα}.
The forward and reverse diffusion processes in HOG-Diff are divided into K hierarchical time windows, denoted as {[τk−1,τk]}k=1K, where 0=τ0<⋯<τk−1<τk<⋯<τK=T. The generation process factorizes the joint distribution of the final graph G0 into a product of conditional distributions across these time windows:
$p(\bm{G}_0)=p(\bm{G}_0|\bm{G}_{\tau_1})p(\bm{G}_{\tau_1}|\bm{G}_{\tau_2}) \cdots p(\bm{G}_{\tau_{K-1}|\bm{G}_{T})$.
During each time window [τk−1,τk], the evolution of the graph is governed by the forward SDE:
dGt(k)=fk,t(Gt(k))dt+gk,tdWt,t∈[τk−1,τk].
The guided diffusion is based on the generalized OU process governed by the SDE:
Q:dGt=θt(μ−Gt)dt+gt(Gt)dWt,
where:
- μ=Gτk is the target terminal state.
- θt is a scalar drift coefficient.
- gt is the diffusion coefficient.
Also, gt2/θt=2σ2, where σ2 is a given constant scalar.
The transition probability admits a closed-form solution:
$p(\bm{G}_{t}\mid \bm{G}_s)=\mathcal{N}(\mathbf{m}_{s:t},v_{s:t}^{2}\bm{I}) = \mathcal{N}(\bm{\mu}+\left(\bm{G}_s-\bm{\mu}\right)e^{-\bar{\theta}_{s:t}, \sigma^2 (1-e^{-2\bar{\theta}_{s:t})\bm{I})$,
where θˉs:t=∫stθzdz.
When applied to the generalized OU process, Doob’s h-transform drives the diffusion process toward a Dirac distribution centered at Gτk.
The conditional marginal distribution p(Gt∣Gτk) evolves according to the SDE:
$\mathrm{d}\bm{G}_t = \theta_t \left( 1 + \frac{2}{e^{2\bar{\theta}_{t:\tau_k}-1} \right)(\bm{G}_{\tau_k} - \bm{G}_t) \mathrm{d}t + g_{k,t} \mathrm{d}\bm{W}_t$.
Generating graph adjacency matrices presents challenges due to the non-uniqueness of graph representations, Pareto distribution and sparsity, and quadratic scaling with the number of nodes. To address these, the paper introduces noise in the eigenvalue domain of the graph Laplacian matrix L=D−A, instead of the adjacency matrix A, where D denotes the diagonal degree matrix.
As a symmetric positive semi-definite matrix, the graph Laplacian can be diagonalized as L=UΛU⊤, where:
- U=[u1,⋯,un] is the orthogonal matrix comprising the eigenvectors.
- Λ=diag(λ1,⋯,λn) is the diagonal matrix holding the corresponding eigenvalues.
Consequently, the reverse-time SDE is split into two parts that share drift and diffusion coefficients:
$\left\{ \begin{aligned} \mathrm{d}\bm{X}_t= &\left[\mathbf{f}_{k,t}(\bm{X}_t) -g_{k,t}^2 \nabla_{\bm{X} \log p_t(\bm{G}_t | \bm{G}_{\tau_k}) \right]\mathrm{d}\bar{t} +g_{k,t}\mathrm{d}\bar{\bm{W}_{t}^1 \ \mathrm{d}\bm{\Lambda}_t= &\left[\mathbf{f}_{k,t}(\bm{\Lambda}_t) - g_{k,t}^2 \nabla_{\bm{\Lambda} \log p_t(\bm{G}_t | \bm{G}_{\tau_k})\right]\mathrm{d}\bar{t} +g_{k,t}\mathrm{d}\bar{\bm{W}_{t}^2 \end{aligned} \right..$
To approximate the score functions, a neural network $\bm{s}^{(k)}_{\bm{\theta}(\bm{G}_t, \bm{G}_{\tau_k},t)$ is employed, composed of a node ($\bm{s}^{(k)}_{\bm{\theta},\bm{X}(\bm{G}_t, \bm{G}_{\tau_k},t)$) and a spectrum ($\bm{s}^{(k)}_{\bm{\theta},\bm{\Lambda}(\bm{G}_t, \bm{G}_{\tau_k},t)$) output. The model is optimized by minimizing the loss function:
$\ell^{(k)}(\bm{\theta})=\mathbb{E}_{t,\bm{G}_t,\bm{G}_{\tau_{k-1},\bm{G}_{\tau_k} \{\omega(t) [c_1\|\bm{s}^{(k)}_{\bm{\theta},\bm{X} - \nabla_{\bm{X} \log p_t(\bm{G}_t | \bm{G}_{\tau_k})\|_2^2 +c_2 ||\bm{s}^{(k)}_{\bm{\theta},\bm{\Lambda} - \nabla_{\bm{\Lambda} \log p_t(\bm{G}_t | \bm{G}_{\tau_k})||_2^2]\},$
where ω(t) is a positive weighting function, and c1,c2 control the relative importance of vertices and spectrum.
The paper provides supportive theoretical evidence for the efficacy of HOG-Diff, demonstrating that the framework achieves faster convergence in score-matching and tighter reconstruction error bounds compared to standard graph diffusion works. Suppose the loss function ℓ(k)(θ) in the aforementioned equation is β-smooth and satisfies the μ-PL condition in the ball B(θ0,R). Then, the expected loss at the i-th iteration of the training process satisfies:
E[ℓ(k)(θi)]≤(1−βN(βN2+μ(b−1))bμ2)iℓ(k)(θ0),
where:
- N denotes the size of the training dataset.
- b is the mini-batch size.
The paper states that $\beta_{\text{HOG-Diff}\leq \beta_{\text{classical}$, implying that the distribution learned by the proposed framework converges to the target distribution faster than classical generative models.
The expected reconstruction error at each generation process is defined as $\mathcal{E}(t)=\mathbb{E}{\bar{\bm{G}_t-\widehat{\bm{G}_t}^2$, where $\bar{\bm{G}_t$ represents the data reconstructed using the ground truth score ∇logpt(⋅) and $\widehat{\bm{G}_t$ denotes the data reconstructed with the learned score function $\bm{s}_{\bm{\theta}$. Under appropriate Lipschitz and boundedness assumptions, the reconstruction error of HOG-Diff satisfies the bound:
$\mathcal{E}(0) \leq \alpha(0)\exp{\int_0^{\tau_1} \gamma(s) } \diff{s}$,
where:
- $\alpha(0)=C^2 \ell^{(1)} (\bm{\theta}) \int_0^{\tau_1} g_{1,s}^4 \diff{s} + C \mathcal{E}(\tau_1) \int_0^{\tau_1} h_{1,s}^2 \diff{s}$.
- $\gamma(s) = C^2 g_{1,s}^4 \|\bm{s}_{\bm{\theta}(\cdot,s)\|_{\mathrm{lip}^2 + C \|h_{1,s}\|_{\mathrm{lip}^2$.
- $h_{1,s} = \theta_s \left(1 + \frac{2}{e^{2\bar{\theta}_{s:\tau_1}-1}\right)$.
The reconstruction error bound of HOG-Diff is sharper than classical graph generation models.
The denoising network $\bm{s}^{(k)}_{\bm{\theta}(\bm{G}_t, \bm{G}_{\tau_k},t)$ estimates score functions. The network comprises a graph convolution network (GCN) for local feature aggregation and a graph transformer network (ATTN) for global information extraction. The outputs of these modules are fused with time information through a Feature-wise Linear Modulation (FiLM) layer. The resulting representations are concatenated to form a unified hidden embedding, which is processed through separate multilayer perceptrons (MLPs) to produce predictions for $\nabla_{\bm{X} \log p(\bm{G}_t|\bm{G}_{\tau_k})$ and $\nabla_{\bm{\Lambda} \log p(\bm{G}_t|\bm{G}_{\tau_k})$, respectively.
The paper experimentally validates HOG-Diff against state-of-the-art baselines for both molecular and generic graph generation. For molecular generation, evaluations are conducted on QM9 and ZINC250k datasets, using metrics such as Neighborhood Subgraph Pairwise Distance Kernel (NSPDK) MMD, Frechet ChemNet Distance (FCD), Validity (Val.), Validity without correction (Val. w/o corr.), Uniqueness (Uni.), and Novelty (Nov.).
For generic graph generation, the model is evaluated over three common generic graph datasets: Community-small, Ego-small, and Enzymes. The paper employs the maximum mean discrepancy (MMD) to quantify the distribution differences across key graph statistics, including degree (Deg.), clustering coefficient (Clus.), and the number of occurrences of orbits with 4 nodes (Orbit).
Ablation studies are conducted using different types of topological information as guides: structures derived from 2-cell filtering (Cell), peripheral structures obtained by removing cell components (Peripheral), and Gaussian random noise (Noise). Results show that cell structures are more effective in guiding the generation.
The paper concludes by emphasizing HOG-Diff's ability to generate high-quality samples and its potential for improving interpretability in graph generation.