HierDiff: Hierarchical Diffusion for 3D Molecules

Updated 17 February 2026

HierDiff is a hierarchical diffusion-based generative model for 3D molecular structure generation, employing a coarse-to-fine approach with SE(3)-equivariant diffusion.
It leverages chemically informed fragmentization and iterative EGNN-based decoding to reconstruct atom-level geometries while preserving chemical validity.
Empirical results indicate enhanced drug-likeness, conformation coverage, and stability compared to traditional atomistic and autoregressive methods.

HierDiff is a hierarchical diffusion-based generative model designed for 3D molecular structure generation. Its primary contribution is a coarse-to-fine methodology that leverages chemically meaningful fragments and SE(3)-equivariant diffusion, enabling high-quality, non-autoregressive molecule generation while preserving local structural validity. This framework addresses common shortcomings of earlier atomistic or autoregressive generative approaches, particularly in maintaining chemical plausibility and scalability for larger molecular systems (Qiang et al., 2023).

1. Coarse-to-Fine Molecular Generation Pipeline

HierDiff operates via a three-stage process:

Fragmentization: Input molecules are decomposed into a graph where each node represents a chemically valid fragment (such as rings or functional groups) and edges indicate shared atoms or bonds. Fragmentization uses a minimum-spanning-tree scheme to avoid cyclic dependencies and balance vocabulary size against molecular coverage.
Coarse-Grained Diffusion: Each fragment node is described by an invariant feature vector $H^f_i \in \mathbb{R}^{d_f}$ (encoding properties like size, ring count, element histogram) and an equivariant spatial coordinate $H^p_i = x_i \in \mathbb{R}^3$ (fragment center). A non-autoregressive SE(3)-equivariant diffusion model jointly operates over $(H^f, H^p)$ , generating a new coarse molecular graph in latent space through denoising diffusion probabilistic modeling.
Fine-Grained Decoding: The coarse graph is iteratively decoded into fine-grained fragments, ultimately reconstructing atom-level geometries. This involves sequential node and edge growth, fragment-type sampling via SE(3)-equivariant GNNs (modified EGNNs), and an iterative refinement process. The all-atom structure is assembled by aligning and merging atomic templates from an RDKit ETKDG library using the Kabsch algorithm, ensuring chemical constraints (valency, bonding) are obeyed.

2. SE(3)-Equivariant Diffusion on Fragment Graphs

The HierDiff diffusion model generalizes denoising diffusion probabilistic models (DDPMs) to operate on SE(3)-equivariant, coarse-grained molecular representations:

Noising Process: Gaussian noise is independently added to both invariant features $H^f$ and positions $H^p$ across diffusion steps:

$q(H^f_t|H^f_{t-1}) = \mathcal{N}\left(H^f_t; \sqrt{1-\beta_t}H^f_{t-1}, \beta_t I\right),$

$q(x_t|x_{t-1}) = \mathcal{N}\left(x_t; \sqrt{1-\beta_t}x_{t-1}, \beta_t I_{3M}\right)$

with the positional prior being an SE(3)-invariant Gaussian centered at zero center of mass.

Denoising Process: Neural networks $\epsilon_\theta$ and $\epsilon_\phi$ predict the noise for positions and features, respectively, using stacks of EGNN layers that maintain SE(3)-equivariance. The reverse distribution for positions is parameterized as:

$p_\theta(x_{t-1}|x_t) = \mathcal{N}\left(x_{t-1};\,\mu_\theta(x_t,t),\,\tilde\beta_t I\right),$

with

$\mu_\theta(x_t,t) = \frac{1}{\sqrt{1-\beta_t}}\Big(x_t - \frac{\beta_t}{\sqrt{1-\bar\alpha_t}} \epsilon_\theta(x_t,t)\Big).$

Network Structure: Each EGNN layer incorporates edge, node, and coordinate updates, ensuring equivariant propagation of information across the fragment graph.

HierDiff decodes coarse graphs into atomic structures through a message-passing approach comprising several probabilistic subroutines:

Focal Node Selection ( $\mathcal{P}_{\text{focal}}$ ): Identifies the next fragment node to expand.
Edge Existence ( $\mathcal{P}_{\text{edge}}$ ): Predicts whether an edge (connection to a new fragment) should be formed, using context-informed EGNN message passing.
Fragment Type ( $\mathcal{P}_{\text{node}}$ ): Samples the type of new fragments to attach.
Iterative Refinement ( $\mathcal{P}_{\text{refine}}$ ): Enhances structural validity and global consistency by randomly masking and resampling node types, driven by likelihood improvement criteria.

All subroutines are implemented via EGNN variants. Generated fragment graphs are subsequently assembled into all-atom structures by associating each fragment node with an atomistic template and aligning it to the generated positions.

4. Assembly into Atomic Structure

Fragment assembly in HierDiff proceeds as follows:

For each node, possible 3D conformations are retrieved from an RDKit ETKDG library.
Atom–atom attachment points between fragments are selected to ensure chemical valency.
Local conformations are rigidly aligned using the Kabsch algorithm: solving

$R, t = \underset{R^T R = I}{\arg\min} \sum_i \| x_i - (R y_i + t) \|^2$

and transforming each atom accordingly.

Overlapping atoms are merged, producing a single connected atom-level graph that preserves chemical validity.

5. Training Objectives and Optimization

HierDiff is trained via a weighted sum of diffusion and decoding losses:

Diffusion Losses ( $\mathcal{L}_{\text{diff}}$ ): Mean squared error for both positional and feature denoising, with an additional loss $\mathcal{L}_0$ for discrete fragment features.
Decoding Cross-Entropy Losses:
- $\mathcal{L}_{\text{focal}}$ for focal node selection
- $\mathcal{L}_{\text{edge}}$ for edge prediction
- $\mathcal{L}_{\text{node}}$ for fragment type prediction
- $\mathcal{L}_{\text{refine}}$ for iterative refinement
Total Loss:

$\mathcal{L} = \mathcal{L}_{\text{diff}} + \mathcal{L}_{\text{focal}} + \mathcal{L}_{\text{edge}} + \mathcal{L}_{\text{node}} + \mathcal{L}_{\text{refine}}$

6. Empirical Evaluation and Results

HierDiff was evaluated on three datasets: GEOMDRUG (304,000 drug-like molecules), CrossDocked2020 (100,000 protein-bound ligands), and QM9 (small molecules). Key baselines included EDM (fully atom-level equivariant diffusion) and G-SphereNet (autoregressive 3D flow).

Performance was assessed via drug-likeness metrics (QED, RA, MCF, SAS, ΔLogP, ΔMW), conformation coverage/matching (at both atom and fragment level, using RMSD < 2Å), and chemical validity/stability (fraction of valid, unique, and connected molecules). Highlighted results include:

GEOMDRUG: HierDiff-P achieved QED = 0.639 (EDM: 0.608), RA = 0.659 (EDM: 0.548), and MCF = 0.774 (EDM: 0.621).
Atom-level validity and stability were approximately 100% with full hydrogen reconstruction, compared to < 97% for EDM.
Conformation coverage (atom): 0.490–0.546 (EDM: 0.489); fragment coverage: 0.202 (EDM: 0.097).
HierDiff exhibited lower conformer MD energy MMD distances (greater stability) than EDM or JT-VAE + ETKDG (Qiang et al., 2023).

7. Ablation Studies and Component Analysis

Ablation experiments revealed:

Removing iterative refinement ( $\mathcal{P}_{\text{refine}}$ ) reduced QED by ∼0.01 and MCF by ∼0.05, underscoring its importance for chemical validity.
The minimum-spanning-tree decomposition for fragmentization achieved a superior balance between vocabulary size and molecular coverage compared to alternatives.
Property-based features (HierDiff-P) yielded slight improvements in QED/MCF, while element-histogram features (HierDiff-E) led to marginally better fragment-level RMSD.
The non-autoregressive global decoder avoided the validity drop seen in autoregressive methods (e.g., G-SphereNet) as molecule size increased.
HierDiff maintained competitive performance even with reduced diffusion steps, attributed to the lower-dimensional latent coarse space.

HierDiff’s chemically informed fragment latent space, principled SE(3)-equivariant diffusion, and GNN-based coarse-to-fine decoding provide a robust framework for 3D molecular generation, demonstrating enhanced drug-likeness, structural validity, and conformational diversity relative to contemporary baselines (Qiang et al., 2023).

Markdown Report Issue Upgrade to Chat

References (1)

Coarse-to-Fine: a Hierarchical Diffusion Model for Molecule Generation in 3D (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to HierDiff.

HierDiff: Hierarchical Diffusion for 3D Molecules

1. Coarse-to-Fine Molecular Generation Pipeline

2. SE(3)-Equivariant Diffusion on Fragment Graphs

3. Fine-Grained Decoding and Iterative Refinement

4. Assembly into Atomic Structure

5. Training Objectives and Optimization

6. Empirical Evaluation and Results

7. Ablation Studies and Component Analysis

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

HierDiff: Hierarchical Diffusion for 3D Molecules

1. Coarse-to-Fine Molecular Generation Pipeline

2. SE(3)-Equivariant Diffusion on Fragment Graphs

3. Fine-Grained Decoding and Iterative Refinement

4. Assembly into Atomic Structure

5. Training Objectives and Optimization

6. Empirical Evaluation and Results

7. Ablation Studies and Component Analysis

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics