SE(3)-Stochastic Flow Matching for Protein Backbone Generation (2310.02391v4)
Abstract: The computational design of novel protein structures has the potential to impact numerous scientific disciplines greatly. Toward this goal, we introduce FoldFlow, a series of novel generative models of increasing modeling power based on the flow-matching paradigm over $3\mathrm{D}$ rigid motions -- i.e. the group $\text{SE}(3)$ -- enabling accurate modeling of protein backbones. We first introduce FoldFlow-Base, a simulation-free approach to learning deterministic continuous-time dynamics and matching invariant target distributions on $\text{SE}(3)$. We next accelerate training by incorporating Riemannian optimal transport to create FoldFlow-OT, leading to the construction of both more simple and stable flows. Finally, we design FoldFlow-SFM, coupling both Riemannian OT and simulation-free training to learn stochastic continuous-time dynamics over $\text{SE}(3)$. Our family of FoldFlow, generative models offers several key advantages over previous approaches to the generative modeling of proteins: they are more stable and faster to train than diffusion-based approaches, and our models enjoy the ability to map any invariant source distribution to any invariant target distribution over $\text{SE}(3)$. Empirically, we validate FoldFlow, on protein backbone generation of up to $300$ amino acids leading to high-quality designable, diverse, and novel samples.
- Normalizing flows for lattice gauge theory in arbitrary space-time dimension (2023). arXiv preprint arXiv:2305.02402, 2023.
- OpenFold: Retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization. bioRxiv, 2022. doi: 10.1101/2022.11.20.517210.
- Improved inverse scaling and squaring algorithms for the matrix logarithm. SIAM Journal on Scientific Computing, 34(4):C153–C169, 2012.
- Protein generation with evolutionary diffusion: sequence is all you need. bioRxiv, pp. 2023–09, 2023.
- Building normalizing flows with stochastic interpolants. International Conference on Learning Representations (ICLR), 2023.
- Stochastic interpolants: A unifying framework for flows and diffusions. arXiv preprint 2303.08797, 2023.
- Protein structure and sequence generation with equivariant denoising diffusion probabilistic models, 2022.
- Wasserstein gan, 2017.
- Matching normalizing flows and probability paths on manifolds. arXiv preprint arXiv:2207.04711, 2022.
- The protein data bank. Nucleic Acids Research, 28(1):235–242, January 2000. ISSN 0305-1048.
- Equivariant finite normalizing flows. arXiv preprint arXiv:2110.08649, 2021.
- Sampling using su(n)𝑠𝑢𝑛su(n)italic_s italic_u ( italic_n ) gauge equivariant flows. arXiv preprint arXiv:2008.05456, 2020.
- Edgi: Equivariant diffusion for planning with embodied agents. Neural Information Processing Systems (2023), 2023.
- Yann Brenier. Polar factorization and monotone rearrangement of vector-valued functions. Communications on Pure and Applied Mathematics, 44:375–417, 1991.
- Manifold density estimation via generalized dequantization. 2021.
- De novo design of picomolar sars-cov-2 miniprotein inhibitors. Science, 370(6515):426–431, 2020a.
- De novo design of picomolar sars-cov-2 miniprotein inhibitors. Science, 370(6515):426–431, 2020b.
- Riemannian flow matching on general geometries. arXiv preprint arXiv:2302.03660, 2023.
- Riemannian Convex Potential Maps. In Marina Meila and Tong Zhang (eds.), Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp. 2028–2038. PMLR, 18–24 Jul 2021.
- DeepJDOT: Deep joint distribution optimal transport for unsupervised domain adaptation. European Conference on Computer Vision (ECCV), 2018.
- Robust deep learning–based protein sequence design using proteinmpnn. Science, 378(6615):49–56, 2022.
- Diffusion schr\"odinger bridge with applications to score-based generative modeling. NeurIPS, 2021.
- Riemannian score-based generative modelling. Advances in Neural Information Processing Systems, 35:2406–2422, 2022.
- Density estimation using real nvp. In The 5th International Conference on Learning Representations (ICLR), Vancouver, 2017.
- Joseph Doob. Classical Potential Theory and Its Probabilistic Counterpart, volume 549. Springer, 1984.
- RA Engh and R Huber. Structure quality and target parameters. 2012.
- Learning with minibatch wasserstein : asymptotic and gradient properties. In Silvia Chiappa and Roberto Calandra (eds.), Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, volume 108 of Proceedings of Machine Learning Research, pp. 2131–2141, Online, 26–28 Aug 2020. PMLR.
- Unbalanced minibatch optimal transport; applications to domain adaptation. 139:3186–3197, 18–24 Jul 2021a.
- Minibatch optimal transport distances; analysis and applications, 2021b.
- How to train your neural ode: The world of jacobian and kinetic regularization. International Conference on Machine Learning (ICML), 2020.
- Pot: Python optimal transport. The Journal of Machine Learning Research, 22(1):3571–3578, 2021.
- Computational design of proteins targeting the conserved stem region of influenza hemagglutinin. Science, 332(6031):816–821, 2011.
- De novo design of protein interactions with learned surface fingerprints. Nature, pp. 1–9, 2023.
- E (n) equivariant normalizing flows. Advances in Neural Information Processing Systems, 34:4181–4192, 2021.
- Learning generative models with Sinkhorn divergences. Artificial Intelligence and Statistics (AISTATS), 2018.
- Brian C Hall. Lie groups, Lie algebras, and representations. Springer, 2013.
- A high-level programming language for generative protein design. bioRxiv, pp. 2022–12, 2022.
- Equivariant diffusion for molecule generation in 3d. In International Conference on Machine Learning, pp. 8867–8887. PMLR, 2022.
- Riemannian diffusion models. Advances in Neural Information Processing Systems, 35:2750–2761, 2022.
- The coming of age of de novo protein design. Nature, 537(7620):320–327, 2016.
- Equivariant 3d-conditional diffusion models for molecular linker design. arXiv preprint arXiv:2210.05274, 2022.
- Bridge simulation and metric estimation on lie groups and homogeneous spaces. 2022.
- Unsupervised protein-ligand binding energy prediction via neural euler’s rotation equation. Neural Information Processing Systems (NeurIPS), 2023.
- Highly accurate protein structure prediction with alphafold. Nature, 596(7873):583–589, 2021.
- Equivariant flow-based sampling for lattice gauge theory. Physical Review Letters, 125(12):121601, 2020.
- Equivariant manifold flows. In ICML Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models, 2021.
- Equivariant flow matching. arXiv preprint arXiv:2306.15030, 2023.
- Equivariant flows: exact likelihood generative learning for symmetric densities. In International conference on machine learning, pp. 5361–5370. PMLR, 2020.
- Denoising diffusion probabilistic models on so (3) for rotational alignment. In ICLR 2022 Workshop on Geometrical and Topological Representation Learning, 2022.
- John M Lee. Introduction to Riemannian manifolds. Springer, 2018.
- Generating novel, designable, and diverse protein structures by equivariantly diffusing oriented residue clouds, 2023.
- Language models of protein sequences at the scale of evolution enable accurate structure prediction. bioRxiv, 2022.
- Flow matching for generative modeling, October 2022.
- I22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPTsb: Image-to-image schrödinger bridge. International Conference on Machine Learning (ICML), 2023a.
- Learning diffusion bridges on constrained domains. In The Eleventh International Conference on Learning Representations, 2022.
- Flow straight and fast: Learning to generate and transfer data with rectified flow. International Conference on Learning Representations (ICLR), 2023b.
- Large language models generate functional protein sequences across diverse families. Nature Biotechnology, pp. 1–8, 2023.
- Riemannian continuous normalizing flows. Advances in Neural Information Processing Systems, 33:2503–2515, 2020.
- On the normal distribution in the orientation space. Textures and Microstructures, 10:595067, Jan 1900. ISSN 1687-5397.
- Phage antibodies: filamentous phage displaying antibody variable domains. nature, 348(6301):552–554, 1990.
- Robert J. McCann. Polar factorization of maps on riemannian manifolds. Geometric & Functional Analysis GAFA, 11:589–608, 2001.
- Se (3) equivariant augmented coupling flows. arXiv preprint arXiv:2308.10364, 2023.
- Normal distribution on the rotation group so(3). Textures and Microstructures, 29:173236, Jan 1900. ISSN 1687-5397.
- Kinematic dexterity of robotic mechanisms. The International Journal of Robotics Research, 13(1):1–15, 1994.
- Nanosecond to microsecond protein dynamics probed by magnetic relaxation dispersion of buried water molecules. Journal of the American Chemical Society, 130(5):1774–1787, 2008. doi: 10.1021/ja0775873.
- Computational Optimal Transport. arXiv:1803.00567, 2019.
- David Pollard. A user’s guide to measure theoretic probability. Number 8. Cambridge University Press, 2002.
- Multisample flow matching: Straightening flows with minibatch couplings. International Conference on Learning Representations (ICLR), 2023.
- Yu Qiu. Isotropic Distributions for 3-dimension Rotations and One-sample Bayes Inference. PhD thesis, Iowa State University, 2013.
- Exploring protein fitness landscapes by directed evolution. Nature reviews Molecular cell biology, 10(12):866–876, 2009.
- Kemp elimination catalysts by computational enzyme design. Nature, 453(7192):190–195, 2008.
- Same same but differnet: Semi-supervised defect detection with normalizing flows. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp. 1907–1916, 2021.
- Improving gans using optimal transport, 2018.
- Atomic-level characterization of the structural dynamics of proteins. Science, 330(6002):341–346, 2010.
- Diffusion Schrödinger bridge matching. arXiv preprint 2303.16852, 2023.
- De novo design of potent and selective mimics of il-2 and il-15. Nature, 565(7738):186–191, 2019.
- Aligned diffusion Schrödinger bridges. arXiv preprint 2302.11419, 2023.
- Computational design of trimeric influenza-neutralizing proteins targeting the hemagglutinin receptor binding site. Nature biotechnology, 35(7):667–671, 2017.
- TrajectoryNet: A dynamic optimal transport network for modeling cellular dynamics. International Conference on Machine Learning (ICML), 2020.
- Simulation-free schrödinger bridges via score and flow matching, 2023a.
- Improving and generalizing flow-based generative models with minibatch optimal transport. arXiv preprint 2302.00482, 2023b.
- Language models generalize beyond natural proteins. bioRxiv, pp. 2022–12, 2022.
- Villani. Topics in Optimal Transportation. American Mathematical Society, 2003.
- C. Villani. Optimal Transport: Old and New. Grundlehren der mathematischen Wissenschaften. Springer Berlin Heidelberg, 2008. ISBN 9783540710509.
- Directed evolution: methodologies and applications. Chemical reviews, 121(20):12384–12444, 2021.
- De novo design of protein structure and function with rfdiffusion. Nature, 620(7976):1089–1100, 2023. ISSN 1476-4687.
- Making antibodies by phage display technology. Annual review of immunology, 12(1):433–455, 1994.
- Protein structure generation via folding diffusion. 2022a.
- High-resolution de novo structure prediction from primary sequence. bioRxiv, 2022b.
- Geodiff: A geometric diffusion model for molecular conformation generation. arXiv preprint arXiv:2203.02923, 2022.
- Machine-learning-guided directed evolution for protein engineering. Nature methods, 16(8):687–694, 2019.
- Fast protein backbone generation with se(3) flow matching, 2023a.
- Se (3) diffusion model with application to protein backbone generation. arXiv preprint arXiv:2302.02277, 2023b.
- Towards predicting equilibrium distributions for molecular systems with deep learning. arXiv preprint arXiv:2306.05445, 2023.
- Avishek Joey Bose (29 papers)
- Tara Akhound-Sadegh (8 papers)
- Kilian Fatras (18 papers)
- Guillaume Huguet (15 papers)
- Jarrid Rector-Brooks (19 papers)
- Cheng-Hao Liu (13 papers)
- Andrei Cristian Nica (3 papers)
- Maksym Korablyov (10 papers)
- Michael Bronstein (77 papers)
- Alexander Tong (40 papers)