Papers
Topics
Authors
Recent
2000 character limit reached

Structure-based drug design by denoising voxel grids (2405.03961v2)

Published 7 May 2024 in cs.LG and q-bio.BM

Abstract: We present VoxBind, a new score-based generative model for 3D molecules conditioned on protein structures. Our approach represents molecules as 3D atomic density grids and leverages a 3D voxel-denoising network for learning and generation. We extend the neural empirical Bayes formalism (Saremi & Hyvarinen, 2019) to the conditional setting and generate structure-conditioned molecules with a two-step procedure: (i) sample noisy molecules from the Gaussian-smoothed conditional distribution with underdamped Langevin MCMC using the learned score function and (ii) estimate clean molecules from the noisy samples with single-step denoising. Compared to the current state of the art, our model is simpler to train, significantly faster to sample from, and achieves better results on extensive in silico benchmarks -- the generated molecules are more diverse, exhibit fewer steric clashes, and bind with higher affinity to protein pockets. The code is available at https://github.com/genentech/voxbind/.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (76)
  1. Equivariant shape-conditioned generation of 3D molecules for ligand-based drug design. arXiv:2210.04893, 2022.
  2. Segdiff: Image segmentation with diffusion probabilistic models. arXiv:2112.00390, 2021.
  3. Anderson, A. C. The process of structure-based drug design. Chemistry & biology, 2003.
  4. Are transformers more robust than cnns? NeurIPS, 2021.
  5. Why is tanimoto index an appropriate choice for fingerprint-based similarity calculations? Journal of cheminformatics, 2015.
  6. Mace: Higher order equivariant message passing neural networks for fast and accurate force fields. Neurips, 2022.
  7. Quantifying the chemical beauty of drugs. Nature chemistry, 2012.
  8. Blundell, T. L. Structure-based drug design. Nature, 1996.
  9. Autodock vina 1.2. 0: New docking methods, expanded force field, and python bindings. JCIM, 2021.
  10. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks, 2018.
  11. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. Journal of cheminformatics, 2009.
  12. Virtual exploration of the small-molecule chemical universe below 160 daltons. Angewandte Chemie International Edition, 2005.
  13. Language models can generate molecules, materials, and protein binding sites directly in three dimensions as xyz, cif, and pdb files. arXiv:2305.05708, 2023.
  14. Three-dimensional convolutional neural networks and a cross-docked data set for structure-based drug design. Journal of chemical information and modeling, 2020.
  15. Protein discovery with discrete walk-jump sampling. In ICLR, 2024.
  16. Generating equilibrium molecules with deep neural networks. arXiv:1810.11347, 2018.
  17. Symmetry-adapted generation of 3D point sets for the targeted discovery of molecules. In NeurIPS, 2019.
  18. E3nn: Euclidean neural networks. arXiv:2207.09453, 2022.
  19. The Lie derivative for measuring learned equivariance. arXiv:2210.02984, 2022.
  20. 3D equivariant diffusion for target-aware molecule generation and affinity prediction. ICLR, 2023a.
  21. DecompDiff: Diffusion models with decomposed priors for structure-based drug design. In ICML, 2023b.
  22. Halgren, T. A. Merck molecular force field. i. basis, form, scope, parameterization, and performance of mmff94. Journal of computational chemistry, 1996.
  23. Benchmarking generated poses: How rational is structure-based drug design with generative models? arXiv:2308.0741, 2023.
  24. Equivariant diffusion for molecule generation in 3D. In ICML, 2022.
  25. Mdm: Molecular diffusion model for 3D molecule generation. arXiv:2209.05710, 2022.
  26. Hyvärinen, A. Estimation of non-normalized statistical models by score matching. JMLR, 2005.
  27. Auto-encoding variational Bayes. In ICLR, 2014.
  28. Equivariant flows: exact likelihood generative learning for symmetric densities. In ICML, 2020.
  29. Landrum, G. Rdkit: Open-source cheminformatics software, 2016. URL https://github.com/rdkit/rdkit/releases/tag/Release_2016_09_4.
  30. On the modeling of polar component of solvation energy using smooth gaussian-based dielectric function. Journal of Theoretical and Computational Chemistry, 2014.
  31. Generating 3D molecules for target protein binding. arXiv, 2022.
  32. Zero-shot 3d drug design by sketching and generating. NeurIPS, 2022.
  33. Decoupled weight decay regularization. In ICLR, 2019.
  34. Repaint: Inpainting using denoising diffusion probabilistic models. In CVPR, 2022.
  35. A 3D generative model for structure-based drug design. NeurIPS, 2021.
  36. An autoregressive flow model for 3D molecular geometry generation from scratch. In ICLR, 2022.
  37. Ultra-large library docking for discovering new chemotypes. Nature, 2019.
  38. Miyasawa, K. An empirical Bayes estimator of the mean of a normal population. Bull. Inst. Internat. Statistics, 1961.
  39. Weisfeiler and leman go neural: Higher-order graph neural networks. In AAAI, 2019.
  40. Open babel: An open chemical toolbox. Journal of cheminformatics, 2011.
  41. Pyuul provides an interface between biological structures and deep learning algorithms. Nature communications, 2022.
  42. Pocket2mol: Efficient molecular sampling based on 3D protein pockets. In ICML, 2022.
  43. 3D molecule generation by denoising voxel grids. In NeurIPS, 2023.
  44. Geometric deep learning for structure-based ligand design. ACS Central Science, 2023.
  45. Incompleteness of graph convolutional neural networks for points clouds in three dimensions. arXiv:2201.07136, 2022.
  46. Learning a continuous representation of 3D molecular structures with deep generative models. In Neurips, Structural Biology workshop, 2020.
  47. Generating 3D molecules conditional on receptor binding sites with deep generative models. Chemical science, 2022.
  48. Uff, a full periodic table force field for molecular mechanics and molecular dynamics simulations. Journal of the American chemical society, 1992.
  49. Variational inference with normalizing flows. In ICML, 2015.
  50. Robbins, H. E. An empirical Bayes approach to statistics. In Proc. 3rd Berkeley Symp. Math. Statist. Probab., 1956, 1956.
  51. High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
  52. U-Net: Convolutional networks for biomedical image segmentation. In MICCAI, 2015.
  53. Alphaspace: Fragment-centric topographical mapping to target protein–protein interaction interfaces. Journal of chemical information and modeling, 2015.
  54. Structure-based drug design via semi-equivariant conditional normalizing flows. In ICLR, Machine Learning for Drug Discovery workshop, 2023.
  55. Langevin dynamics with variable coefficients and nonconservative forces: from stationary states to numerical methods. Entropy, 2017.
  56. Palette: Image-to-image diffusion models. In SIGGRAPH, 2022a.
  57. Photorealistic text-to-image diffusion models with deep language understanding. In NeurIPS, 2022b.
  58. Image super-resolution via iterative refinement. PAMI, 2022c.
  59. Neural empirical Bayes. JMLR, 2019.
  60. Universal smoothed score functions for generative modeling. arXiv:2303.11669, 2023.
  61. E (n) equivariant graph neural networks. In ICML, 2021.
  62. The surprising effectiveness of diffusion models for optical flow and monocular depth estimation. arXiv:2306.01923, 2023.
  63. Structure-based drug design with equivariant diffusion models. arXiv:2210.13695, 2022.
  64. Shape-based generative modeling for de novo drug design. Journal of chemical information and modeling, 2019.
  65. Deep unsupervised learning using nonequilibrium thermodynamics. In ICML, 2015.
  66. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature biotechnology, 2017.
  67. Integrating structure-based approaches in generative molecular design. Current Opinion in Structural Biology, 2023.
  68. Atom3D: Tasks on molecules in three dimensions. NeurIPS, 2020.
  69. Midi: Mixed graph and 3D denoising diffusion for molecule generation. ICLR, MLDD workshop, 2023.
  70. A pocket-based 3D molecule generative model fueled by experimental electron density. Scientific reports, 2022a.
  71. Relation: A deep generative model for structure-based de novo drug design. Journal of Medicinal Chemistry, 2022b.
  72. Generating molecular conformer fields. arXiv:2311.17932, 2023.
  73. 3D steerable cnns: Learning rotationally equivariant features in volumetric data. NeurIPS, 2018.
  74. How powerful are graph neural networks? ICLR, 2019.
  75. Geometric latent diffusion models for 3D molecule generation. In ICML, 2023.
  76. Molecule generation for target protein binding with structural motifs. In ICLR, 2023.
Citations (2)

Summary

  • The paper introduces VoxBind, a structure-based method that harnesses 3D voxel grids and score-based denoising to generate binding-efficient drug molecules.
  • It employs a two-step process using Langevin MCMC for sampling and a 3D U-Net convolutional neural network for denoising noisy molecular structures.
  • VoxBind outperforms state-of-the-art methods by achieving superior binding affinities and faster sampling, offering promising advances in computational drug discovery.

Understanding VoxBind: Voxel-Based Generative Model for Structure-Based Drug Design

Introduction

Structure-based drug design (SBDD) has always been a critical area within computational drug discovery. Essentially, the goal is to generate molecules, or ligands, that bind effectively to specific biomolecular structures, such as protein pockets. Traditional methods often involve virtual screenings, where databases of molecules are searched to identify potential candidates. However, this approach can be inefficient as the complexity of chemical space increases.

VoxBind introduces a novel voxel-based generative model aimed at improving upon the existing SBDD methods, primarily by using a score-based approach. Let's explore the components that make this method intriguing and effective.

Key Components of VoxBind

3D Voxel Representation

One major innovation in VoxBind is its use of a 3D voxel grid to represent molecules. Each molecule is discretized into a three-dimensional grid, where each voxel (a 3D pixel) represents a tiny volume within the molecule. This method contrasts with point-cloud approaches, which treat atoms as discrete points in space.

Here's why voxel grids are a game-changer:

  • Expressive Representation: Voxel grids can capture detailed 3D patterns and surfaces, making them particularly suitable for modeling the intricate shapes of protein pockets and molecules.
  • Adaptation from Image Generation: The architecture used in VoxBind draws inspiration from image generation techniques, which have already been proven effective in other domains like computer vision.

Neural Empirical Bayes (NEB) Framework

VoxBind extends the NEB framework to handle conditional generation tasks. The two-step generation process involves:

  1. Sampling Noisy Molecules: Using a process called Langevin Markov Chain Monte Carlo (MCMC), voxels are sampled from a Gaussian-smoothed distribution.
  2. Denoising: A convolutional neural network (CNN) is employed to clean up the sampled noisy molecules to produce more accurate, binding-efficient molecules.

Training and Architecture

The training involves a conditional denoising model that consists of two main parts:

  • Encoders: Separate encoders for noisy ligands and protein pockets.
  • U-Net Architecture: A 3D U-Net structure processes the combined encoder output to produce the final denoised molecule.

This design is computationally efficient due to the convolutional filters' ability to capture 3D spatial patterns effectively.

Results and Performance

The performance of VoxBind is quite impressive compared to state-of-the-art methods like TargetDiff and DecompDiff. Here are some key findings:

  • Binding Affinity: VoxBind produces molecules with higher binding affinities, an essential metric for SBDD. In benchmark comparisons, VoxBind achieved the best VinaScore and VinaDock among competing models.
  • Efficiency in Training and Sampling: VoxBind is significantly faster and simpler to train. Sampling is also more efficient, sometimes achieving results an order of magnitude quicker than other methods.

Practical Implications and Future Directions

VoxBind's framework allows for flexibility in practical applications, such as initializing the model with fragments of molecules rather than starting from scratch. This property can be leveraged for scaffold hopping or linking tasks, similar to in-painting tasks in image generation.

However, the method does have some limitations. The use of voxel grids can consume significant memory, which might limit the model's scalability for larger biomolecules like nucleic acids or full proteins. Future work could focus on optimizing data representation and architecture to overcome these limitations.

Conclusion

VoxBind offers a compelling alternative to traditional SBDD methods, showing notable improvements in binding affinity and efficiency. By leveraging voxel representations and advanced generative modeling techniques, this approach promises to unlock new possibilities in drug discovery. Future developments might even lead to scaling up the model to tackle larger and more complex biomolecules, broadening the horizons of computational drug design.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 102 likes about this paper.