Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Scalable Normalizing Flows Enable Boltzmann Generators for Macromolecules (2401.04246v1)

Published 8 Jan 2024 in cs.LG and q-bio.BM

Abstract: The Boltzmann distribution of a protein provides a roadmap to all of its functional states. Normalizing flows are a promising tool for modeling this distribution, but current methods are intractable for typical pharmacological targets; they become computationally intractable due to the size of the system, heterogeneity of intra-molecular potential energy, and long-range interactions. To remedy these issues, we present a novel flow architecture that utilizes split channels and gated attention to efficiently learn the conformational distribution of proteins defined by internal coordinates. We show that by utilizing a 2-Wasserstein loss, one can smooth the transition from maximum likelihood training to energy-based training, enabling the training of Boltzmann Generators for macromolecules. We evaluate our model and training strategy on villin headpiece HP35(nle-nle), a 35-residue subdomain, and protein G, a 56-residue protein. We demonstrate that standard architectures and training strategies, such as maximum likelihood alone, fail while our novel architecture and multi-stage training strategy are able to model the conformational distributions of protein G and HP35.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Simple few-state models reveal hidden complexity in protein folding. The Proceedings of the National Academy of Sciences, 109(44):17807–17813, 2012.
  2. Amber 14, 2014.
  3. Relaxing bijectivity constraints with continuously indexed normalising flows, 2019.
  4. The third igg-binding domain from streptococcal protein g. an analysis by x-ray crystallography of the structure alone and in a complex with fab. Journal of Molecular Biology, 243(5):906–918, 1994.
  5. Temperature steerable flows and boltzmann generators. Phys. Rev. Research, 4(L042005), 2022.
  6. Density estimation using real nvp. In International Conference on Learning Representations, 2017.
  7. Neural spline flows. In NeurIPS, 2019.
  8. Simulate time-integrated coarse-grained molecular dynamics with multi-scale graph networks. Transactions on Machine Learning Research, 2023.
  9. Densely connected normalizing flows, 2021.
  10. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper_files/paper/2017/file/8a1d694707eb0fefe65871369074926d-Paper.pdf.
  11. Liisa Holm. Using dali for protein structure comparison. Methods in Molecular Biology, pp.  29–42, 2020.
  12. Protein structure comparison by alignment of distance matrices. Journal of Molecular Biology, 233(1):123–138, 1993.
  13. Protein tertiary structure modeling driven by deep learning and contact distance prediction in casp13. Proteins Structure Function Bioinformatics, 87(12):1165–1178, 2019.
  14. Transformer quality in linear time, 2022.
  15. Takashi Ichinomiya. Topological data analysis gives two folding paths in hp35(nle-nle), double mutant of villin headpiece subdomain. Scientific Reports, 12(1), 2022.
  16. Skipping the replica exchange ladder with normalizing flows. J. Phys. Chem. Lett., 13(50):11643––11649, 2022.
  17. Torsional diffusion for molecular conformer generation. arXiv preprint arXiv:2206.01729, 2022.
  18. Clustenmd: efficient sampling of biomolecular conformational space at atomic resolution. Bioinformatics, 37(21):3956–3958, 2021.
  19. Timewarp: Transferable acceleration of molecular dynamics by learning time-coarsened dynamics, 2023a.
  20. Equivariant flow matching, 2023b.
  21. Smooth normalizing flows. In Advances in Neural Information Processing Systems 34, 2021.
  22. Flow-matching – efficient coarse-graining molecular dynamics without forces. arXiv preprint arXiv:2203.11167, 2022.
  23. On the generalization of equivariance and convolution in neural networks to the action of compact groups. In ICML, 2018a.
  24. 3d steerable cnns: learning rotationally equivariant features in volumetric data. In NeurIPS, 2018b.
  25. Learning correlations between internal coordinates to improve 3d cartesian coordinates for proteins. J. Chem. Theory Comput., 19(14):4689––4700, 2023.
  26. Mega: Moving average equipped gated attention, 2023.
  27. Accurate sampling of macromolecular conformations using adaptive deep learning and coarse-grained representation. Journal of Chemical Information and Modeling, 62(7), 2022.
  28. Effective sample size for importance sampling based on discrepancy measures. Signal Processing, 131:386–401, February 2017. ISSN 0165-1684. doi: 10.1016/j.sigpro.2016.08.025. URL http://dx.doi.org/10.1016/j.sigpro.2016.08.025.
  29. Umap: Uniform manifold approximation and projection. The Journal of Open Source Software, 3(29):861, 2018.
  30. Se(3) equivariant augmented coupling flows, 2023.
  31. Flow annealed importance sampling bootstrap. arXiv preprint arXiv:2208.01893, 2022.
  32. Transformers without tears: Improving the normalization of self-attention. In IWSLT, 2019.
  33. Boltzmann generators: Sampling equilibrium states of many-body systems with deep learning. Science, 365(6457), 2019.
  34. Exploring the limits of transfer learning with a unified text-to-text transformer, 2020.
  35. Searching for activation functions, 2017.
  36. Variational inference with normalizing flows. In Francis Bach and David Blei (eds.), Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pp.  1530–1538, Lille, France, 07–09 Jul 2015. PMLR. URL https://proceedings.mlr.press/v37/rezende15.html.
  37. Normalizing flows on tori and spheres. In Hal Daumé III and Aarti Singh (eds.), Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pp.  8083–8092. PMLR, 13–18 Jul 2020. URL https://proceedings.mlr.press/v119/rezende20a.html.
  38. Protein structure prediction using multiple deep neural networks in the 13th critical assessment of protein structure prediction (casp13). Proteins Structure Function Bioinformatics, 87:1141–1148, 2019.
  39. Atomic-level characterization of the structural dynamics of proteins. Science, 330(6002):341–346, 2010.
  40. Self-attention with relative position representations, 2018.
  41. Principal component analysis on a torus: Theory and application to protein dynamics. J. Chem. Phys., 147(244101), 2017.
  42. Roformer: Enhanced transformer with rotary position embedding. arXiv preprint arXiv:2104.09864, 2021.
  43. Tensor field networks: Rotation- and translation-equivariant neural networks for 3d point clouds, 2018.
  44. Internal coordinate molecular dynamics: A foundation for multiscale dynamics. J. Phys. Chem. B, 119(4):1233–1242, 2015.
  45. From data to noise to data for mixing physics across temperatures with generative artificial intelligence. The Proceedings of the National Academy of Sciences, 119(32), 2022.
  46. Normalizing flows for atomic solids. Machine Learning: Science and Technology, 3(2), 2022.
  47. Stochastic normalizing flows. In NeurIPS, 2020.
  48. Protein structure generation via folding diffusion. arXiv preprint arXiv:2209.15611, 2022.
  49. Analysis of distance-based protein structure prediction by deep learning in casp13. Proteins Structure Function Bioinformatics, 87(12):1069–1081, 2019.
  50. A unified approach to protein domain parsing with inter-residue distance matrix. Bioinformatics, 39(2), 2023.
Citations (4)

Summary

We haven't generated a summary for this paper yet.