Backdiff: a diffusion model for generalized transferable protein backmapping (2310.01768v2)
Abstract: Coarse-grained (CG) models play a crucial role in the study of protein structures, protein thermodynamic properties, and protein conformation dynamics. Due to the information loss in the coarse-graining process, backmapping from CG to all-atom configurations is essential in many protein design and drug discovery applications when detailed atomic representations are needed for in-depth studies. Despite recent progress in data-driven backmapping approaches, devising a backmapping method that can be universally applied across various CG models and proteins remains unresolved. In this work, we propose BackDiff, a new generative model designed to achieve generalization and reliability in the protein backmapping problem. BackDiff leverages the conditional score-based diffusion model with geometric representations. Since different CG models can contain different coarse-grained sites which include selected atoms (CG atoms) and simple CG auxiliary functions of atomistic coordinates (CG auxiliary variables), we design a self-supervised training framework to adapt to different CG atoms, and constrain the diffusion sampling paths with arbitrary CG auxiliary variables as conditions. Our method facilitates end-to-end training and allows efficient sampling across different proteins and diverse CG models without the need for retraining. Comprehensive experiments over multiple popular CG models demonstrate BackDiff's superior performance to existing state-of-the-art approaches, and generalization and flexibility that these approaches cannot achieve. A pretrained BackDiff model can offer a convenient yet reliable plug-and-play solution for protein researchers, enabling them to investigate further from their own CG models.
- Machine learning approach for accurate backmapping of coarse-grained models to all-atom models. Chemical Communications, 56(65):9312–9315, 2020.
- Brian DO Anderson. Reverse-time diffusion equation models. Stochastic Processes and their Applications, 12(3):313–326, 1982.
- Computational reconstruction of atomistic protein structures from coarse-grained models. Computational and structural biotechnology journal, 18:162–176, 2020.
- E (3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nature communications, 13(1):2453, 2022.
- Diffusion posterior sampling for general noisy inverse problems. arXiv preprint arXiv:2209.14687, 2022.
- Molecular dynamics simulation. Entropy, 16(233):1, 2014.
- Macromolecular modeling with rosetta. Annu. Rev. Biochem., 77:363–382, 2008.
- Optimal coarse-grained site selection in elastic network models of biomolecules. Journal of chemical theory and computation, 15(1):648–664, 2018.
- Simulate time-integrated coarse-grained molecular dynamics with geometric machine learning. arXiv preprint arXiv:2204.10348, 2022.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
- Torsional diffusion for molecular conformer generation. Advances in Neural Information Processing Systems, 35:24240–24253, 2022.
- Diamondback: Diffusion-denoising autoregressive model for non-deterministic backmapping of cα𝛼\alphaitalic_α protein traces. Journal of Chemical Theory and Computation, 2023.
- Coarse-grained protein models and their applications. Chemical reviews, 116(14):7898–7936, 2016.
- Ped in 2021: a major update of the protein ensemble database for intrinsically disordered proteins. Nucleic acids research, 49(D1):D404–D411, 2021.
- Backmapping coarse-grained macromolecules: An efficient and versatile machine learning approach. The Journal of Chemical Physics, 153(4), 2020.
- Reconstructing atomistic detail for coarse-grained models with resolution exchange. The Journal of chemical physics, 129(11), 2008.
- A unified coarse-grained model of biological macromolecules based on mean-field multipole–multipole interactions. Journal of molecular modeling, 20:1–15, 2014.
- Glimps: a machine learning approach to resolution transformation for multiscale modeling. Journal of Chemical Theory and Computation, 17(12):7930–7937, 2021.
- Invariant and equivariant graph networks. arXiv preprint arXiv:1812.09902, 2018.
- The martini force field: coarse grained model for biomolecular simulations. The journal of physical chemistry B, 111(27):7812–7824, 2007.
- Coarse-grained molecular dynamics and the atomic limit of finite elements. Physical review B, 58(10):R5893, 1998.
- E (n) equivariant graph neural networks. In International conference on machine learning, pp. 9323–9332. PMLR, 2021.
- Atomic-level characterization of the structural dynamics of proteins. Science, 330(6002):341–346, 2010.
- Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020.
- Martini 3: a general purpose force field for coarse-grained molecular dynamics. Nature methods, 18(4):382–388, 2021.
- Csdi: Conditional score-based diffusion models for probabilistic time series imputation. Advances in Neural Information Processing Systems, 34:24804–24816, 2021.
- Cg2at2: an enhanced fragment-based approach for serial multi-scale molecular dynamics simulations. Journal of Chemical Theory and Computation, 17(10):6472–6482, 2021.
- Generative coarse-graining of molecular conformations. arXiv preprint arXiv:2201.12176, 2022.
- Chemically transferable generative backmapping of coarse-grained proteins. arXiv preprint arXiv:2303.01569, 2023.