Propensity Score Alignment of Unpaired Multimodal Data (2404.01595v2)
Abstract: Multimodal representation learning techniques typically rely on paired samples to learn common representations, but paired samples are challenging to collect in fields such as biology where measurement devices often destroy the samples. This paper presents an approach to address the challenge of aligning unpaired samples across disparate modalities in multimodal representation learning. We draw an analogy between potential outcomes in causal inference and potential views in multimodal observations, which allows us to use Rubin's framework to estimate a common space in which to match samples. Our approach assumes we collect samples that are experimentally perturbed by treatments, and uses this to estimate a propensity score from each modality, which encapsulates all shared information between a latent state and treatment and can be used to define a distance between samples. We experiment with two alignment techniques that leverage this distance -- shared nearest neighbours (SNN) and optimal transport (OT) matching -- and find that OT matching results in significant improvements over state-of-the-art alignment approaches in both a synthetic multi-modal setting and in real-world data from NeurIPS Multimodal Single-Cell Integration Challenge.
- Interventional causal representation learning. In ICML, 2023.
- Augmented cyclegan: Learning many-to-many mappings from unpaired data. In ICML, 2018.
- Magan: Aligning biological manifolds. In ICML, 2018.
- Learning linear causal representations from interventions under general nonlinear mixing. In NeurIPS, 2023.
- Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nature Biotechnology, 36(5):411–420, 2018.
- A unified computational framework for single-cell data integration with optimal transport. Nature Communications, 13(1):7419, 2022.
- Multi-omics single-cell data integration and regulatory inference with graph-linked embedding. Nature Biotechnology, 40(10):1458–1466, 2022.
- Scot: single-cell multi-omics alignment with optimal transport. Journal of Computational Biology, 29(1):3–18, 2022.
- Pot: Python optimal transport. Journal of Machine Learning Research, 22(78):1–8, 2021.
- Contrastive mixture of posteriors for counterfactual inference, data integration and fairness. In ICML, 2022.
- Learning generative models with sinkhorn divergences. In AISTATS, 2018.
- Matching single cells across modalities with contrastive learning and optimal transport. Briefings in Bioinformatics, 24(3), 2023.
- The incomplete rosetta stone problem: Identifiability results for multi-view nonlinear ica. In UAI, 2020.
- Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In AISTATS, 2010.
- Deep iv: A flexible approach for counterfactual prediction. In ICML, 2017.
- Variational autoencoders and nonlinear ICA: A unifying framework. In AISTATS, 2020.
- Fast, sensitive and accurate integration of single-cell data with harmony. Nature Methods, 16(12):1289–1296, 2019.
- Multimodal single cell data integration challenge: Results and lessons learned. In NeurIPS 2021 Competitions and Demonstrations Track, pp. 162–176, 2022.
- Jointly Embedding Multiple Single-Cell Omics Measurements. In 19th International Workshop on Algorithms in Bioinformatics (WABI 2019), 2019.
- Unsupervised image-to-image translation networks. NeurIPS, 2017.
- Learning transferable visual models from natural language supervision. In ICML, 2021.
- Rubin, D. B. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66(5):688, 1974.
- Optimal-transport analysis of single-cell gene expression identifies developmental trajectories in reprogramming. Cell, 176(4):928–943, 2019.
- Spivak, M. Calculus on Manifolds: a Modern Approach to Classical Theorems of Advanced Calculus. CRC press, 2018.
- Linear causal disentanglement via interventions. In ICML, 2023.
- Unpaired multi-domain causal representation learning. In NeurIPS, 2023.
- Trajectorynet: A dynamic optimal transport network for modeling cellular dynamics. In ICML, 2020.
- Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
- Villani, C. Optimal Transport: Old and New, volume 338. Springer, 2009.
- Nonparametric identifiability of causal representations from unknown interventions. In NeurIPS, 2023.
- Indeterminacy in generative models: Characterization and strong identifiability. In AISTATS, 2023.
- Multi-domain translation between single-cell imaging and sequencing data using autoencoders. Nature Communications, 12(1):31, 2021.
- Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision, pp. 2223–2232, 2017.