PuckerFlow: Generative Conformer Sampling
- PuckerFlow is a generative machine learning framework that uses Cremer–Pople coordinates to model the puckering modes of cyclic molecules.
- It combines continuous-time flow matching with an E(3)-equivariant graph neural network and a cyclic Fourier filter to ensure accurate ring closure.
- The method achieves state-of-the-art precision and diversity in conformer generation, enabling high-throughput exploration for drug discovery and catalysis.
PuckerFlow is a generative machine learning framework for sampling conformers of cyclic molecules by operating directly on Cremer–Pople (CP) internal coordinates. These coordinates represent the (N–3) "puckering" degrees of freedom of an N-membered ring, capturing essential ring deformations such as boats, envelopes, and chairs while excluding rigid-body and exocyclic modes. PuckerFlow advances conformational sampling by combining continuous-time flow matching in CP space with an E(3)-equivariant graph neural network (GNN) and a cyclic Fourier filter. The model achieves state-of-the-art precision and diversity in cyclic conformer generation, efficiently generating valid closed rings and supporting high-throughput exploration of ring scaffolds relevant to drug discovery and catalysis (Schaufelberger et al., 19 Jan 2026).
1. Cremer–Pople Coordinate Framework
PuckerFlow is built upon the Cremer–Pople coordinate system, which provides a low-dimensional representation for the out-of-plane deformations in monocyclic rings. Given a ring of N atoms:
- The molecule is oriented such that its mean plane coincides with the – plane.
- The out-of-plane displacement for each atom is defined as , where is the normal of the mean plane.
- Atom positions around the ring are parameterized by for .
- For each puckering mode , the coordinates are determined by discrete Fourier transforms:
- For even , an additional mode is given by:
- The aggregated puckering amplitude is ; defines a planar ring.
By this transformation, the entire set of $3N$ Cartesian coordinates reduces to internal degrees , succinctly characterizing the physically allowed, closed geometries of the ring without redundancy (Schaufelberger et al., 19 Jan 2026).
2. Flow Matching in Cremer–Pople Manifold
PuckerFlow applies flow matching on the CP manifold rather than on Cartesian space. The generative model learns a time-dependent vector field in CP space to transport a uniform, bounded prior to the empirical ring conformer distribution via the ODE:
This process is governed by the density evolution:
Training uses the Conditional Flow Matching (CFM) objective, where for , , , and ,
This approach enables direct specification of physical feasibility (e.g., closed-ring constraints) through the prior in CP space, thereby avoiding artifacts from unphysical regions that are common in diffusion models using Gaussian noise.
3. Model Architecture: Equivariant GNN and Cyclic Fourier Filter
The architecture combines a 3D E(3)-equivariant GNN and a cyclic Fourier filter:
- For each training batch, partial CP samples are converted into approximate Cartesian ring geometries using precomputed average bond lengths and angles, stratified by atom types and ring size.
- The GNN employs message passing up to spherical-harmonic order on node and edge features—atomic number, hybridization, ring size, time embedding , and interatomic distances (within 5 Å)—to produce rotation- and reflection-equivariant embeddings.
- A cyclic Fourier filter operates as the final layer: learnable, rotation-equivariant filters are arranged around the ring, their outputs convolved with node embeddings, and a discrete Fourier transform yields the CP pseudoscalar outputs .
- Hyperparameters validated during development include 4–6 GNN interaction layers, 32-dimensional scalar channels, 4-dimensional second-order channels, batch normalization, and AdamW optimizer with learning rate .
The filter design ensures equivariance and locality around the ring topology, facilitating accurate inference of puckering modes needed for ring closure while handling all symmetry aspects of the CP manifold.
4. Data Preparation, Training, and Closed-Ring Constraints
Training data comprises 15 204 unique conformers of five- to eight-membered non-aromatic rings, sourced from COD, PQR, ZINC, and platinum-ligand datasets. Rings are hydrogenated when substituent hybridizations are preserved. Preprocessing involves:
- CP coordinates are computed using a ring-puckering library.
- Dictionaries of average bond lengths and angles, keyed by atomic and ring identity, are constructed on the training splits, with a nearest-neighbor fallback for missing motifs based on empirically weighted metrics.
- Training enforces feasibility conditions: prior amplitude bounds (e.g., 0.8 Å, 0.56 Å, 0.4 Å), projected bond lengths (), projected angles (), and convex-ring assumption. Less than 0.01% of conformers are discarded as non-reconstructible.
- Models are trained for 300 epochs with AdamW and standard weight decay; no explicit diffusion noise is used.
These steps guarantee valid, physically consistent samples both in learning and inference.
5. Sampling, Geometry Reconstruction, and Computational Workflow
Conformer generation follows an ODE-based sampling mechanism:
- Sample , a uniform prior within physically valid amplitude bounds.
- Integrate the learned ODE with steps (typically , though quality remains high for –5):
- At , obtain , then reconstruct Cartesian coordinates:
- Recover by inverse Fourier sums.
- Compute projected bond lengths and angles algebraically.
- Stitch the ring via Cremer’s algorithm using three planar segments before re-attaching out-of-plane displacements.
This procedure ensures direct control over conformer diversity and ring validity at every stage.
6. Quantitative Evaluation and Comparative Performance
Conformational fidelity is quantified using:
- Average Minimum RMSD (AMR): precision (AMR-P, average minimum RMSD from generated to reference) and recall (AMR-R, average minimum RMSD from reference to generated).
- Coverage fraction (COV) within 0.1 Å RMSD: COV-P (precision) and COV-R (recall).
Performance is summarized in the following table (for unrelaxed, puckering-only RMSD):
| Method | AMR-P / COV-P (%) | AMR-R / COV-R (%) |
|---|---|---|
| PuckerFlow | 0.13 / 67.5 | 0.09 / 75.8 |
| MCF (Euclidean flow) | 0.16 / 46.2 | 0.12 / 60.0 |
| GeoDiff (diffusion) | 0.24 / 24.9 | 0.18 / 42.4 |
| RDKit ETKDG | 0.17 / 51.4 | 0.13 / 60.1 |
For full-atom RMSD:
| Method | AMR-P / COV-P (%) | AMR-R / COV-R (%) |
|---|---|---|
| PuckerFlow | 0.18 / 47.4 | 0.15 / 51.0 |
| MCF | 0.23 / 24.6 | 0.19 / 35.6 |
| GeoDiff | 0.28 / 16.4 | 0.22 / 30.5 |
After MMFF94 relaxation, performance gaps narrow but PuckerFlow remains best in precision and competitive in recall. PuckerFlow uses approximately parameters versus for MCF and achieves similar or better results with far fewer inference steps (2–30 versus 50–5000) (Schaufelberger et al., 19 Jan 2026).
7. Applications, Domain Coverage, and Limitations
PuckerFlow accurately samples multimodal puckering distributions for 5–8-membered monocyclic rings, including envelopes, boats, chairs, pseudorotation circles, and mixed heteroatom scaffolds (e.g., thiazines, azasilinanes, oxazepanes, phosphocanes), which are pertinent to drug and catalyst design workflows. Full-molecule conformers can be synthesized by augmenting generated ring cores with exocyclic substituents using established algorithms (e.g., RDKit ETKDG, pretrained torsional-diffusion models).
The method’s end-to-end differentiable structure, gradient-based flow objective, and direct operation in internal coordinate space render it suitable for property-guided conformer generation (for example, targeting docking scores or substrate geometries), capabilities not afforded by discrete post-processing methods.
Current limitations include restriction to small/medium monocyclic rings (), convex planar projections, and minimal substituent diversity. Planned extensions include handling macrocycles, fused and spiro rings (via multiple CP subsystems), and joint modeling of ring and torsional degrees of freedom (Schaufelberger et al., 19 Jan 2026).