Aligned Diffusion Schrödinger Bridges (2302.11419v3)
Abstract: Diffusion Schr\"odinger bridges (DSB) have recently emerged as a powerful framework for recovering stochastic dynamics via their marginal observations at different time points. Despite numerous successful applications, existing algorithms for solving DSBs have so far failed to utilize the structure of aligned data, which naturally arises in many biological phenomena. In this paper, we propose a novel algorithmic framework that, for the first time, solves DSBs while respecting the data alignment. Our approach hinges on a combination of two decades-old ideas: The classical Schr\"odinger bridge theory and Doob's $h$-transform. Compared to prior methods, our approach leads to a simpler training procedure with lower variance, which we further augment with principled regularization schemes. This ultimately leads to sizeable improvements across experiments on synthetic and real data, including the tasks of predicting conformational changes in proteins and temporal evolution of cellular differentiation processes.
- JAX: composable transformations of Python+NumPy programs, 2018. URL http://github.com/google/jax.
- Learning Single-Cell Perturbation Responses using Neural Optimal Transport. bioRxiv, 2021.
- Supervised Training of Conditional Monge Maps. In Advances in Neural Information Processing Systems (NeurIPS), 2022a.
- Proximal Optimal Transport Modeling of Population Dynamics. In International Conference on Artificial Intelligence and Statistics (AISTATS), volume 25, 2022b.
- The Schrödinger Bridge between Gaussian Measures has a Closed Form. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2023.
- Likelihood Training of Schrödinger Bridge using Forward-Backward SDEs Theory. In International Conference on Learning Representations (ICLR), 2022a.
- Live-seq enables temporal transcriptomic recording of single cells. Nature, 608, 2022b.
- Optimal Transport in Systems and Control. Annual Review of Control, Robotics, and Autonomous Systems, 4, 2021a.
- Stochastic Control Liaisons: Richard Sinkhorn Meets Gaspard Monge on a Schrödinger Bridge. SIAM Review, 63(2), 2021b.
- Diffusion Steps, Twists, and Turns for Molecular Docking. In International Conference on Learning Representations (ICLR), 2023.
- Marco Cuturi. Sinkhorn Distances: Lightspeed Computation of Optimal Transport. In Advances in Neural Information Processing Systems (NeurIPS), volume 26, 2013.
- Optimal Transport Tools (OTT): A JAX Toolbox for all things Wasserstein. arXiv Preprint arXiv:2201.12324, 2022.
- Diffusion Schrödinger Bridge with Applications to Score-Based Generative Modeling. In Advances in Neural Information Processing Systems (NeurIPS), volume 35, 2021.
- A Web Interface for Easy Flexible Protein-Protein Docking with ATTRACT. Biophysical Journal, 108(3), 2015.
- Performance and Its Limits in Rigid Body Protein-Protein Docking. Structure, 28(9), 2020.
- Haddock: a protein- protein docking approach based on biochemical or biophysical information. Journal of the American Chemical Society, 125(7):1731–1737, 2003.
- Joseph Doob. Classical Potential Theory and Its Probabilistic Counterpart, volume 549. Springer, 1984.
- Data assimilation in weather forecasting: a case study in PDE-constrained optimization. Optimization and Engineering, 10(3), 2009.
- Robert Fortet. Résolution d’un systeme d’équations de M. Schrödinger. J. Math. Pure Appl. IX, 1, 1940.
- Independent SE(3)-Equivariant Models for End-to-End Rigid Protein Docking. In International Conference on Learning Representations (ICLR), 2022.
- e3nn: Euclidean neural networks. arXiv preprint arXiv:2207.09453, 2022.
- A Kernel Two-Sample Test. Journal of Machine Learning Research, 13, 2012.
- Denoising Diffusion Probabilistic Models. In Advances in Neural Information Processing Systems (NeurIPS), 2020.
- Path Integral Stochastic Optimal Control for Sampling Transition Paths. arXiv preprint arXiv:2207.02149, 2022.
- Equivariant diffusion for molecule generation in 3d. In International Conference on Machine Learning, pages 8867–8887. PMLR, 2022.
- Wolfgang Kabsch. A solution for the best rotation to relate two sets of vectors. Acta Crystallographica Section A: Crystal Physics, Diffraction, Theoretical and General Crystallography, 32(5), 1976.
- Diffdock-pp: Rigid protein-protein docking with diffusion models. arXiv preprint arXiv:2304.03889, 2023.
- The ClusPro web server for protein–protein docking. Nature Protocols, 12(2), 2017.
- Solomon Kullback. Probability densities with given marginals. The Annals of Mathematical Statistics, 39(4):1236–1243, 1968.
- Christian Léonard. A survey of the Schrödinger problem and some of its connections with optimal transport. arXiv preprint arXiv:1308.0215, 2013.
- Tune: A Research Platform for Distributed Model Selection and Training. arXiv preprint arXiv:1807.05118, 2018.
- Deep Generalized Schrödinger Bridge. In Advances in Neural Information Processing Systems (NeurIPS), 2022a.
- Deep Generalized Schrödinger Bridge. In Advances in Neural Information Processing Systems (NeurIPS), 2022b.
- Learning Diffusion Bridges on Constrained Domains. International Conference on Learning Representations (ICLR), 2023.
- scGen predicts single-cell perturbation responses. Nature Methods, 16(8), 2019.
- Aspects of Brownian motion. Springer Science & Business Media, 2008.
- An integrated suite of fast docking algorithms. Proteins: Structure, Function, and Bioinformatics, 78(15), 2010.
- PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems (NeurIPS), 2019.
- Stefano Peluchetti. Diffusion bridge mixture transports, schrödinger bridge problems and generative modeling, 2023.
- D3pm: a comprehensive database for protein motions ranging from residue to domain. BMC bioinformatics, 23(1):1–11, 2022.
- Computational Optimal Transport. Foundations and Trends in Machine Learning, 11(5-6), 2019.
- Diffusions, Markov Processes and Martingales: Volume 2, Itô Calculus, volume 2. Cambridge University Press, 2000.
- E(n)-equivariant graph neural networks. arXiv preprint arXiv:2102.09844, 2021.
- Guided proposals for simulating multi-dimensional diffusion bridges. Bernoulli, 23(4A), 2017.
- Optimal-Transport Analysis of Single-Cell Gene Expression Identifies Developmental Trajectories in Reprogramming. Cell, 176(4), 2019.
- Protein-protein and peptide-protein docking and refinement using ATTRACT in CAPRI. Proteins: Structure, Function, and Bioinformatics, 85(3):391–398, 2017.
- PatchDock and SymmDock: servers for rigid and symmetric docking. Nucleic Acids Research, 33, 2005.
- Erwin Schrödinger. Über die Umkehrung der Naturgesetze. Verlag der Akademie der Wissenschaften in Kommission bei Walter De Gruyter u. Company, 1931.
- Generative Modeling by Estimating Gradients of the Data Distribution. In Advances in Neural Information Processing Systems (NeurIPS), 2019.
- Score-Based Generative Modeling through Stochastic Differential Equations. In International Conference on Learning Representations (ICLR), volume 9, 2021.
- Drug and disease signature integration identifies synergistic combinations in glioblastoma. Nature Communications, 9(1), 2018.
- Tensor field networks: Rotation-and translation-equivariant neural networks for 3d point clouds. arXiv preprint arXiv:1802.08219, 2018.
- TrajectoryNet: A Dynamic Optimal Transport Network for Modeling Cellular Dynamics. In International Conference on Machine Learning (ICML), 2020.
- Conditional Flow Matching: Simulation-Free Dynamic Optimal Transport. arXiv preprint arXiv:2302.00482, 2023.
- Harnessing protein folding neural networks for peptide-protein docking. Nature Communications, 13(1):176, 2022.
- Solving Schrödinger Bridges via Maximum Likelihood. Entropy, 23(9), 2021.
- Updates to the integrated protein–protein interaction benchmarks: docking benchmark version 5 and affinity benchmark version 2. Journal of Molecular Biology, 427(19), 2015.
- Lineage tracing on transcriptional landscapes links state to fate during differentiation. Science, 367, 2020.
- SCANPY: large-scale single-cell gene expression data analysis. Genome Biology, 19(1), 2018.
- Geodiff: A geometric diffusion model for molecular conformation generation. In International Conference on Learning Representations, 2022.
- The HDOCK server for integrated protein–protein docking. Nature Protocols, 15(5), 2020.
- Path Integral Sampler: A Stochastic Control Approach For Sampling. In International Conference on Learning Representations (ICLR), 2022.