Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Meta Flow Matching: Integrating Vector Fields on the Wasserstein Manifold (2408.14608v2)

Published 26 Aug 2024 in cs.LG and stat.ML

Abstract: Numerous biological and physical processes can be modeled as systems of interacting entities evolving continuously over time, e.g. the dynamics of communicating cells or physical particles. Learning the dynamics of such systems is essential for predicting the temporal evolution of populations across novel samples and unseen environments. Flow-based models allow for learning these dynamics at the population level - they model the evolution of the entire distribution of samples. However, current flow-based models are limited to a single initial population and a set of predefined conditions which describe different dynamics. We argue that multiple processes in natural sciences have to be represented as vector fields on the Wasserstein manifold of probability densities. That is, the change of the population at any moment in time depends on the population itself due to the interactions between samples. In particular, this is crucial for personalized medicine where the development of diseases and their respective treatment response depend on the microenvironment of cells specific to each patient. We propose Meta Flow Matching (MFM), a practical approach to integrate along these vector fields on the Wasserstein manifold by amortizing the flow model over the initial populations. Namely, we embed the population of samples using a Graph Neural Network (GNN) and use these embeddings to train a Flow Matching model. This gives MFM the ability to generalize over the initial distributions, unlike previously proposed methods. We demonstrate the ability of MFM to improve the prediction of individual treatment responses on a large-scale multi-patient single-cell drug screen dataset.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (65)
  1. Task2vec: Task embedding for meta-learning. In Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pages 6430–6439.
  2. Building normalizing flows with stochastic interpolants. arXiv preprint arXiv:2209.15571.
  3. Gradient flows: in metric spaces and in the space of probability measures. Springer Science & Business Media.
  4. Meta optimal transport. arXiv preprint arXiv:2206.05262.
  5. Amos, B. et al. (2023). Tutorial on amortized optimization. Foundations and Trends® in Machine Learning, 16(5):592–732.
  6. Input convex neural networks. In International conference on machine learning, pages 146–155. PMLR.
  7. Deciphering cell–cell interactions and communication from gene expression. Nature Reviews Genetics, 22(2):71–88.
  8. Benamou, J.-D. (2003). Numerical resolution of an “unbalanced” mass transport problem. ESAIM: Mathematical Modelling and Numerical Analysis, 37(5):851–868.
  9. Understanding the tumor immune microenvironment (time) for effective therapy. Nature Medicine, 24(5):541–550.
  10. Learning single-cell perturbation responses using neural optimal transport. Nature Methods, 20(11):1759–1768.
  11. Generative flows on discrete state-spaces: Enabling multimodal flows with applications to protein co-design. arXiv preprint arXiv:2402.04997.
  12. Dissecting heterogeneous cell populations across drug and disease conditions with popalign. Proceedings of the National Academy of Sciences, 117(46):28784–28794.
  13. Learning to optimize: A primer and a benchmark. Journal of Machine Learning Research, 23(189):1–59.
  14. Unbalanced optimal transport: Dynamic and kantorovich formulations. Journal of Functional Analysis, 274(11):3090–3123.
  15. Single-cell rna-seq enables comprehensive tumour and immune cell profiling in primary breast cancer. Nature Communications, 8(1).
  16. Flow matching in latent space. arXiv preprint arXiv:2307.08698.
  17. Diffusion schrödinger bridge with applications to score-based generative modeling. Advances in Neural Information Processing Systems, 34:17695–17709.
  18. Multimodal pooled perturb-cite-seq screens in patient models define mechanisms of cancer immune evasion. Nature Genetics, 53(3):332–341.
  19. Neural message passing for quantum chemistry. In International conference on machine learning, pages 1263–1272. PMLR.
  20. Gap junctions. Cold Spring Harb Perspect Biol, 1(1):a002576.
  21. A kernel two-sample test. The Journal of Machine Learning Research, 13(1):723–773.
  22. Single-cell transcriptional diversity is a hallmark of developmental potential. Science, 367(6476):405–411.
  23. Learning population-level diffusions with generative recurrent networks. In Proceedings of the 33rd International Conference on Machine Learning, pages 2417–2426.
  24. Predicting cellular responses to novel drug perturbations at a single-cell resolution. In Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A., editors, Advances in Neural Information Processing Systems, volume 35, pages 26711–26722. Curran Associates, Inc.
  25. Meta-learning in neural networks: A survey. IEEE transactions on pattern analysis and machine intelligence, 44(9):5149–5169.
  26. Manifold interpolating optimal-transport flows for trajectory inference.
  27. Geodesic sinkhorn: Optimal transport for high-dimensional datasets. In IEEE MLSP.
  28. Extended flow matching: a method of conditional generation with generalized continuity equation. arXiv preprint arXiv:2402.18839.
  29. Mean field limit and propagation of chaos for vlasov systems with bounded forces. Journal of Functional Analysis, 271(12):3588–3627.
  30. Machine learning for perturbational single-cell omics. Cell Systems, 12(6):522–537.
  31. Neural lagrangian schr\"odinger bridge. In ICLR.
  32. Flow matching for generative modeling. In The Eleventh International Conference on Learning Representations.
  33. I2sb: Image-to-image schrödinger bridge. In ICML.
  34. Flow straight and fast: Learning to generate and transfer data with rectified flow. arXiv preprint arXiv:2209.03003.
  35. Deep generative modeling for single-cell transcriptomics. Nature Methods, 15(12):1053–1058.
  36. scgen predicts single-cell perturbation responses. Nature Methods, 16(8):715–721.
  37. Optimal transport mapping via input convex neural networks. In ICML.
  38. A single cell characterisation of human embryogenesis identifies pluripotency transitions and putative anterior hypoblast centre. Nature Communications, 12(1).
  39. A computational framework for solving wasserstein lagrangian flows. arXiv preprint arXiv:2310.10649.
  40. Action matching: A variational method for learning stochastic dynamics from samples.
  41. Otto, F. (2001). The geometry of dissipative evolution equations: the porous medium equation.
  42. scperturb: harmonized single-cell perturbation data. Nature Methods, pages 1–10.
  43. Computational Optimal Transport. arXiv:1803.00567.
  44. Multisample flow matching: Straightening flows with minibatch couplings. arXiv preprint arXiv:2304.14772.
  45. Gencast: Diffusion-based ensemble forecasting for medium-range weather. arXiv preprint arXiv:2312.15796.
  46. Trellis tree-based analysis reveals stromal regulation of patient-derived organoid drug responses. Cell, 186(25):5606–5619.e24.
  47. Single-cell topological rna-seq analysis reveals insights into cellular differentiation and development. Nature Biotechnology, 35(6):551–560.
  48. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695.
  49. Palette: Image-to-image diffusion models. In ACM SIGGRAPH 2022 conference proceedings, pages 1–10.
  50. Photorealistic text-to-image diffusion models with deep language understanding. Advances in neural information processing systems, 35:36479–36494.
  51. Learning to simulate complex physics with graph networks. In International conference on machine learning, pages 8459–8468. PMLR.
  52. Optimal-transport analysis of single-cell gene expression identifies developmental trajectories in reprogramming. Cell, 176(4):928–943.
  53. Aligned diffusion schröodinger bridges. In UAI.
  54. Exponential scaling of single-cell rna-seq in the past decade. Nature protocols, 13(4):599–604.
  55. Improving and generalizing flow-based generative models with minibatch optimal transport. Transactions on Machine Learning Research. Expert Certification.
  56. Trajectorynet: A dynamic optimal transport network for modeling cellular dynamics. In International conference on machine learning, pages 9526–9536. PMLR.
  57. Simulation-free schrödinger bridges via score and flow matching. In International Conference on Artificial Intelligence and Statistics, pages 1279–1287. PMLR.
  58. The monge gap: A regularizer to learn all transport maps. In Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., and Scarlett, J., editors, Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 34709–34733. PMLR.
  59. Climode: Climate and weather forecasting with physics-informed neural odes. arXiv preprint arXiv:2404.10024.
  60. Villani, C. (2009). Optimal transport: old and new, volume 338. Springer.
  61. Fundamental limits on dynamic inference from single-cell snapshots. 115(10):E2467–E2476.
  62. Scalable unbalanced optimal transport using generative adversarial networks. In 7th International Conference on Learning Representations, page 20.
  63. Deep sets. Advances in neural information processing systems, 30.
  64. Single-cell rna sequencing-based computational analysis to describe disease heterogeneity. Frontiers in Genetics, 10.
  65. Guided flows for generative modeling and decision making. arXiv preprint arXiv:2311.13443.
Citations (3)

Summary

  • The paper introduces Meta Flow Matching, which integrates vector fields on the Wasserstein manifold using GNN-based embeddings to generalize across varied initial data distributions.
  • It leverages conditional flow matching with a joint loss function that optimizes both the vector field and the GNN embedding, enhancing model flexibility.
  • Experimental results on synthetic datasets and single-cell drug screening data show superior predictive accuracy over traditional flow matching methods.

Meta Flow Matching: Integrating Vector Fields on the Wasserstein Manifold

Overview

The paper "Meta Flow Matching: Integrating Vector Fields on the Wasserstein Manifold" presents an advanced approach to modeling dynamic systems where the evolution of the population's probability distribution is critical. This work addresses the limitations of existing flow-based models, which are generally restricted to single initial populations or predefined conditions, by proposing a novel method—Meta Flow Matching (MFM). MFM integrates vector fields on the Wasserstein manifold and incorporates the generalization ability across different initial distributions via Graph Neural Networks (GNNs).

The paper's core contributions include a new framework for learning vector fields that describe population evolution, validation on synthetic datasets, and the application to single-cell drug screen data, showcasing the model's ability to predict individual treatment responses better than existing methods.

Methodology

Flow Matching and Conditional Flow Matching

Flow Matching models a continuous interpolation between probability densities over time, captured by a vector field parameterized using a neural network. The essential component is the continuity equation, which prescribes how densities change under the vector field. The optimization objective aims to minimize the discrepancy between the model's vector field and the true underlying vector field over all time points.

Conditional Flow Matching (CFM) extends this approach by conditioning the vector field on auxiliary variables representing different population dynamics. This is done by incorporating these conditional variables directly into the neural network's input, creating a family of curves in the flow matching space.

Meta Flow Matching (MFM)

The paper proposes Meta Flow Matching—a generalization of Flow Matching that leverages a GNN to embed entire initial population distributions. The GNN processes a graph constructed from the population samples, producing an embedding vector used as input to the vector field model. This approach allows the vector field to adapt based on the embedded representation, enabling the model to generalize to unseen population distributions.

The authors formalize this approach through a loss function that jointly optimizes the vector field and the GNN embedding parameters. The training algorithm is iterative, alternating updates between these components.

Experimental Results

The evaluation encompassed both synthetic datasets and real-world single-cell perturbation data. The synthetic dataset demonstrated how MFM could generalize to previously unseen distributions, effectively learning the dynamics of letter silhouettes subjected to a diffusive process. On this dataset, MFM showed superior generalization performance compared to standard flow matching (FM) and conditional generative flow matching (CGFM).

For real-world applicability, MFM was tested on a large-scale single-cell drug screening dataset. This dataset contains patient-derived cell populations treated with various chemotherapies. The results indicated that MFM could predict the evolution of these populations under treatment conditions not seen during training. In contrast, other methods like FM and CGFM failed to generalize effectively, either due to their inability to model inter-sample interactions or lack of conditioning flexibility.

Implications and Future Directions

The proposed MFM framework has significant implications for modeling complex dynamical systems in natural sciences, particularly in personalized medicine. The ability to accurately predict the evolution of cell populations under different conditions can lead to better treatment strategies tailored to individual patients' microenvironments.

From a theoretical standpoint, MFM advances the field of probabilistic modeling on the Wasserstein manifold, introducing a robust method that accommodates a broader range of dynamic processes influenced by inter-sample interactions. The use of GNNs to embed population distributions allows for a scalable approach that can handle high-dimensional biological datasets.

Future research could extend MFM by exploring different GNN architectures or embedding techniques, potentially improving the embedding's accuracy and generalization capabilities. Additionally, incorporating stochastic elements into the model could handle more diverse real-world scenarios where dynamics are inherently noisy.

Conclusion

This paper demonstrates a significant step forward in modeling the dynamics of interacting systems at the population level. Meta Flow Matching effectively generalizes across diverse initial distributions, leveraging the representational power of GNNs, and integrates seamlessly with the mathematical framework provided by the Wasserstein manifold. The empirical results underscore its utility in both synthetic settings and practical applications, suggesting its potential impact on fields like personalized medicine and beyond.

Youtube Logo Streamline Icon: https://streamlinehq.com