FLOWR: Flow Matching for Structure-Aware De Novo, Interaction- and Fragment-Based Ligand Generation (2504.10564v2)

Published 14 Apr 2025 in q-bio.QM, cs.LG, and q-bio.BM

Abstract: We introduce FLOWR, a novel structure-based framework for the generation and optimization of three-dimensional ligands. FLOWR integrates continuous and categorical flow matching with equivariant optimal transport, enhanced by an efficient protein pocket conditioning. Alongside FLOWR, we present SPINDR, a thoroughly curated dataset comprising ligand-pocket co-crystal complexes specifically designed to address existing data quality issues. Empirical evaluations demonstrate that FLOWR surpasses current state-of-the-art diffusion- and flow-based methods in terms of PoseBusters-validity, pose accuracy, and interaction recovery, while offering a significant inference speedup, achieving up to 70-fold faster performance. In addition, we introduce FLOWR:multi, a highly accurate multi-purpose model allowing for the targeted sampling of novel ligands that adhere to predefined interaction profiles and chemical substructures for fragment-based design without the need of re-training or any re-sampling strategies

Summary

The paper introduces Flowr, which integrates continuous and categorical flow matching with equivariant modelling to enhance de novo ligand generation.
It demonstrates up to 70-fold inference speed improvements and superior metrics in ligand quality, bond geometries, and interaction recovery against diffusion-based models.
The study also presents Spindr, a meticulously curated dataset that mitigates data leakage and establishes a robust benchmark for generative ligand design.

Insights into Flowr: Generative Flow Matching for Ligand Design

The paper introduces Flowr, an advanced framework developed for structure-based drug discovery focusing on the de novo generation of ligands. Flowr incorporates continuous and categorical flow matching methodologies with an emphasis on structural awareness, utilizing state-of-the-art techniques like equivariant optimal transport. This paper not only proposes a new model, Flowr, but also presents a new dataset, Spindr, crafted to support and benchmark such generative tasks.

Flowr Model Overview

Flowr is designed to substantially improve the process of ligand generation by integrating geometry-aware techniques directly into its generative models. Key components of the model include:

Flow Matching Techniques: By using continuous and discrete flow models, Flowr enables the generation of ligand coordinates along with atom types and bond orders, accounting for both continuous (spatial) and categorical (chemical types) data.
Equivariant Modelling: The use of equivariant methods ensures that the spatial generation of ligands respects the symmetries inherent to molecular structures, critical for maintaining correct chemical orientations during ligand formation.
Efficiency Focus: Notable improvements in computational efficiency, with inference speedups up to 70-fold over prior models, make Flowr highly scalable and applicable to real-time drug discovery tasks.

Spindr Dataset

The Spindr dataset is developed as a high-quality benchmark for 3D generative models, addressing critical deficiencies in existing ligand-protein complex datasets:

Data Quality: Spindr involves extensive curation to resolve issues such as missing loop conformations and incorrect protonation states, prevalent in datasets like CrossDocked2020 and PDBBind.
Bias Mitigation: By using Plinder's methodically split data, Spindr effectively mitigates risks of data leakage between training and test sets, ensuring a realistic assessment of model generalization.

Experimental Validation

The empirical analysis reveals Flowr's superiority over competing models such as Pilot and several diffusion-based models:

Numerical Superiority: Flowr demonstrates improved performance on critical metrics including RDKit- and PoseBusters-validity, Vina scores, bond angles, and lengths Wasserstein distances.
Ligand Quality: Flowr generates ligands with substantially lower strain energies and improved interaction recovery rates, indicating more physically plausible ligand geometries.
Computational Efficiency: With far reduced inference times, Flowr's model architecture supports accelerated iterations, vital for contemporary drug development cycles.

Interaction-Conditional and Multi-Conditional Capabilities

Flowr.multi, a multi-purpose extension of the primary model, shows versatility in:

Interaction Conditional Generation: It significantly improves interaction recovery, making it ideal for tasks requiring high fidelity to predefined interaction profiles.
Fragment-Based Applications: Flowr.multi supports scaffold hopping and other fragment-based design approaches without needing model retraining, paving the way for targeted ligand discovery efforts.

Broader Implications

The paper projects Flowr and Spindr as instrumental tools in AI-driven drug discovery. The integration of AI techniques addressing both the geometrical and chemical challenges inherent in ligand generation is poised to enhance the reliability and applicability of structure-aware design strategies. Furthermore, the Spindr dataset establishes new grounds for assessing generative models, offering a robust platform for future evaluations.

Flowr's significant advance represents a convergence of state-of-the-art computative methodologies with rigorously curated data, collectively enhancing the impact and potential of automated drug design systems. Looking ahead, future developments could further optimize these models, especially in handling explicit hydrogen configurations and extending the chemical space sampled during training. This could lead to even more accurate and efficient models for real-world applications in medicinal chemistry and pharmaceutical research.

Related Papers

Find Related Papers

Tweets

https://twitter.com/Pastel/status/1912400326629486889

https://twitter.com/AllThingsApx/status/1912544913805226406

https://twitter.com/sprintome/status/1917067888621502777

https://twitter.com/Pastel/status/1922629910805279185

https://twitter.com/organelletx/status/1929411592690766089