Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
94 tokens/sec
Gemini 2.5 Pro Premium
55 tokens/sec
GPT-5 Medium
18 tokens/sec
GPT-5 High Premium
24 tokens/sec
GPT-4o
103 tokens/sec
DeepSeek R1 via Azure Premium
93 tokens/sec
GPT OSS 120B via Groq Premium
462 tokens/sec
Kimi K2 via Groq Premium
254 tokens/sec
2000 character limit reached

Cross-modality Matching and Prediction of Perturbation Responses with Labeled Gromov-Wasserstein Optimal Transport (2405.00838v3)

Published 1 May 2024 in q-bio.GN and math.OC

Abstract: It is now possible to conduct large scale perturbation screens with complex readout modalities, such as different molecular profiles or high content cell images. While these open the way for systematic dissection of causal cell circuits, integrated such data across screens to maximize our ability to predict circuits poses substantial computational challenges, which have not been addressed. Here, we extend two Gromov-Wasserstein Optimal Transport methods to incorporate the perturbation label for cross-modality alignment. The obtained alignment is then employed to train a predictive model that estimates cellular responses to perturbations observed with only one measurement modality. We validate our method for the tasks of cross-modality alignment and cross-modality prediction in a recent multi-modal single-cell perturbation dataset. Our approach opens the way to unified causal models of cell biology.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. Geometric dataset distances via optimal transport. In Advances in Neural Information Processing Systems, volume 33, pp.  21428–21439, 2020.
  2. Gromov-Wasserstein alignment of word embedding spaces. In Empirical Methods in Natural Language Processing, pp.  1881–1890, 2018.
  3. Structured optimal transport. In Artificial Intelligence and Statistics, volume 84, pp.  1771–1780, 2018.
  4. MultiVI: deep generative model for the integration of multimodal data. Nature Methods, 20(8):1222–1231, 2023.
  5. High-content CRISPR screening. Nature Reviews Methods Primers, 2(1):8, 2022.
  6. Learning single-cell perturbation responses using neural optimal transport. Nature Methods, 20(11):1759–1768, 2023.
  7. InfoOT: Information maximizing optimal transport. In International Conference on Machine Learning, volume 202, pp.  6228–6242, 2023.
  8. Joint single-cell measurements of nuclear proteins and RNA in vivo. Nature Methods, 18(10):1204–1212, 2021.
  9. Inference of single cell profiles from histology stains with the Single-Cell omics from histology analysis framework (SCHAF). bioRxiv, 2023.
  10. Joint distribution optimal transportation for domain adaptation. In Advances in Neural Information Processing Systems, pp.  3733–3742, 2017.
  11. Cuturi, M. Sinkhorn distances: Lightspeed computation of optimal transport. In Advances in Neural Information Processing Systems, volume 26, 2013.
  12. Optimal transport tools (OTT): A JAX toolbox for all things Wasserstein, 2022.
  13. DeepJDOT: Deep joint distribution optimal transport for unsupervised domain adaptation. European Conference on Computer Vision, 2018.
  14. SCOT: Single-cell multi-omics alignment with optimal transport. Journal of Computational Biology, 29(1):3–18, 2022a.
  15. Jointly aligning cells and genomic features of single-cell multi-omics data with co-optimal transport. bioRxiv, pp.  2022.11.09.515883, 2022b.
  16. Perturb-Seq: Dissecting molecular circuits with scalable single-cell RNA profiling of pooled genetic screens. Cell, 167(7):1853–1866.e17, 2016.
  17. Optical pooled screens in human cells. Cell, 179(3):787–799.e17, 2019.
  18. Multimodal pooled perturb-cite-seq screens in patient models define mechanisms of cancer immune evasion. Nature Genetics, 53(3):332–341, 2021.
  19. Domain-adversarial training of neural networks. Journal of Machine Learning Research, 17(59):1–35, 2016.
  20. A python library for probabilistic analysis of single-cell omics data. Nature Biotechnology, 40(2):163–166, 2022.
  21. Systematically characterizing the roles of e3-ligase family members in inflammatory responses with massively parallel perturb-seq. bioRxiv, 2023. doi: 10.1101/2023.01.23.525198.
  22. Matching single cells across modalities with contrastive learning and optimal transport. Briefing in Bioinformatics, 24(3), 2023.
  23. Kantorovich, L. Mathematical methods of organizing and planning production. Management science, 6(4):366– 422, 1960.
  24. On convergence and stability of GANs. arXiv, 2017.
  25. Jointly embedding multiple Single-Cell omics measurements. In Algorithms in Bioinformatics, volume 143, 2019.
  26. A joint model of unpaired data from scRNA-seq and spatial transcriptomics for imputing missing gene expression measurements. ICML Workshop in Computational Biology, 2019.
  27. Mémoli, F. Gromov–Wasserstein distances and the metric approach to object matching. Foundations of Computational Mathematics, 11(4):417–487, 2011.
  28. Monge, G. Mémoire sur la théorie des déblais et des remblais. Mémoires de l’Académie royale des sciences de Paris, 1781.
  29. Computational Optimal Transport. Foundations and Trends in Machine Learning, 2018.
  30. Gromov-Wasserstein averaging of kernel and distance matrices. International Conference on Machine Learning, pp.  2664–2672, 2016.
  31. Co-optimal transport. Advances in Neural Information Processing Systems, 33(17559-17570):2, 2020.
  32. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol., 19(1):15, 2018.
  33. Hierarchical optimal transport for comparing histopathology datasets. In Medical Imaging with Deep Learning, volume 172, pp.  1459–1469, 2022.
Citations (4)

Summary

  • The paper introduces Labeled GWOT by integrating perturbation labels into optimal transport to improve cross-modality alignment and predict cell state changes.
  • The methodology leverages efficient Sinkhorn iterations and block-level updates, achieving an L-fold speedup over traditional GWOT computations.
  • Experimental results on multi-modal single-cell RNA and protein data show superior matching and prediction accuracy compared to baseline OT methods.

Cross-modality Matching and Prediction of Perturbation Responses with Labeled Gromov-Wasserstein Optimal Transport

The paper by Ryu et al. focuses on advancing the computational methodologies used to analyze data from large-scale perturbation screens, which are pivotal in understanding gene function and small-molecule interactions within cells. The authors propose novel adaptations of Gromov-Wasserstein Optimal Transport (GWOT) methods to better handle cross-modality alignment and prediction tasks, particularly in the context of multi-modal single-cell perturbation data.

Overview

The authors present a methodological framework that incorporates perturbation labels into the GWOT and Co-Optimal Transport (COOT) algorithms to improve the performance of cross-modality alignments. This approach is motivated by the fact that typical perturbation screens involve a relatively homogeneous cell type and exhibit minor changes in cell state, complicating traditional alignment techniques. By utilizing perturbation labels as constraints in the optimal transport plan, the proposed methods, Labeled GWOT and Labeled COOT, aim to leverage the intrinsic grouping of samples to enhance prediction accuracy and computational efficiency.

Methodological Contributions

The core contribution involves extending standard GWOT formulations to account for perturbation label information, effectively introducing a label-compatible constraint within the optimal transport problem. This refinement is key to capturing the nuanced topology of the phenotypic space induced by cellular perturbations. Specifically, the paper details:

  1. Labeled GWOT: Integrates perturbation labels into the alignment process, reducing error and improving the prediction of cell state changes across modalities. Through a series of Sinkhorn iterations, the method provides a more informed coupling of cellular states across different data modalities.
  2. Computational Efficiency: By calculating transport at the resolution of perturbation labels, the proposed methods achieve significant computational acceleration—an L-fold improvement in calculating costs compared to traditional GWOT, where L is the number of labels.
  3. Algorithmic Implementation: The authors develop efficient block-level updates for OT calculations, which process data in label-specific blocks, further optimizing computational resources.

Experimental Validation

The proposed methods are validated using a dataset comprising single-cell RNA and protein profiles obtained under various perturbations. The experiments demonstrate that Labeled GWOT and Labeled COOT achieve superior performance in terms of sample matching and prediction accuracy compared to baselines without label adaptation. Notably, GWOT-based approaches significantly outperform OT-based methods, emphasizing the benefit of optimizing distance between distances rather than relying on shared cost metrics.

The prediction tasks involve estimating RNA responses from protein measurements, a problem relevant to reducing experimental costs of perturbation screens. Results indicate robust correlations between predicted and observed perturbation responses, underscoring the method's applicability in practical multi-modal integration scenarios.

Implications and Future Directions

The implications of this work are manifold. Practically, the methodology enhances the integration of high-content screening data, facilitating better-informed drug discovery processes by leveraging scalability and interpretability. Theoretically, it demonstrates the potential of incorporating label information into optimal transport frameworks, paving the way for further enhancements in computational biology.

Future research could explore more advanced latent representations to mitigate modality-specific noise inherent in high-dimensional omics data. Additionally, scaling these approaches to include more complex and varied datasets through neural OT models or low-rank approximations may offer further improvements in performance and applicability across broader contexts within bioinformatics and systems biology.

Overall, the paper provides a detailed mathematical and algorithmic foundation for the use of labeled data in optimal transport problems, contributing significantly to the field of computational biology and multi-modal data analysis.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube