Disentangling shared and private latent factors in multimodal Variational Autoencoders (2403.06338v1)
Abstract: Generative models for multimodal data permit the identification of latent factors that may be associated with important determinants of observed data heterogeneity. Common or shared factors could be important for explaining variation across modalities whereas other factors may be private and important only for the explanation of a single modality. Multimodal Variational Autoencoders, such as MVAE and MMVAE, are a natural choice for inferring those underlying latent factors and separating shared variation from private. In this work, we investigate their capability to reliably perform this disentanglement. In particular, we highlight a challenging problem setting where modality-specific variation dominates the shared signal. Taking a cross-modal prediction perspective, we demonstrate limitations of existing models, and propose a modification how to make them more robust to modality-specific variation. Our findings are supported by experiments on synthetic as well as various real-world multi-omics data sets.
- Integrative single-cell analysis. Nature Reviews Genetics, 20(5):257–272, May 2019. ISSN 1471-0064. doi: 10.1038/s41576-019-0093-7. URL https://www.nature.com/articles/s41576-019-0093-7. Number: 5 Publisher: Nature Publishing Group.
- Multi-omics approaches to disease. Genome biology, 18(1):1–15, 2017. Publisher: BioMed Central.
- State of the Field in Multi-Omics Research: From Computational Needs to Data Mining and Sharing. Frontiers in Genetics, 11, 2020. ISSN 1664-8021. URL https://www.frontiersin.org/articles/10.3389/fgene.2020.610798.
- Harold Hotelling. Relations between two sets of variates. Biometrika, 28(3/4):321, 1936. Publisher: JSTOR.
- Ledyard R. Tucker. An inter-battery method of factor analysis. Psychometrika, 23(2):111–136, 1958. Publisher: Springer.
- Generative models that discover dependencies between data sets. In 2006 16th IEEE Signal Processing Society Workshop on Machine Learning for Signal Processing, pages 123–128. IEEE, 2006.
- Shared Gaussian process latent variable models. PhD Thesis, Citeseer, 2009.
- Bayesian group factor analysis. In Artificial Intelligence and Statistics, pages 1269–1277. PMLR, 2012.
- Bayesian Canonical correlation analysis. Journal of Machine Learning Research, 14(4), 2013.
- Deep variational canonical correlation analysis. arXiv preprint arXiv:1610.03454, 2016.
- End-to-end Training of Deep Probabilistic CCA on Paired Biomedical Observations. In Proceedings of The 35th Uncertainty in Artificial Intelligence Conference, pages 945–955. PMLR, August 2020. URL https://proceedings.mlr.press/v115/gundersen20a.html. ISSN: 2640-3498.
- Multi-view Learning as a Nonparametric Nonlinear Inter-Battery Factor Analysis. J. Mach. Learn. Res., 22(86):1–51, 2021.
- Auto-encoding variational bayes. Proceedings of the International Conference on Learning Representations (ICLR), 2014.
- Stochastic backpropagation and approximate inference in deep generative models. arXiv preprint arXiv:1401.4082, 2014.
- Deep generative modeling for single-cell transcriptomics. Nature methods, 15(12):1053, 2018.
- Predicting cellular responses to complex perturbations in high‐throughput screens. Molecular Systems Biology, 19(6):e11517, June 2023. ISSN 1744-4292. doi: 10.15252/msb.202211517. URL https://www.embopress.org/doi/full/10.15252/msb.202211517. Publisher: John Wiley & Sons, Ltd.
- Disentangling shared and group-specific variations in single-cell transcriptomics data with multiGroupVI. In Proceedings of the 17th Machine Learning in Computational Biology meeting, pages 16–32. PMLR, December 2022. URL https://proceedings.mlr.press/v200/weinberger22a.html. ISSN: 2640-3498.
- A mixture-of-experts deep generative model for integrated analysis of single-cell multiomics data. Cell reports methods, 1(5):100071, 2021. Publisher: Elsevier.
- Joint probabilistic modeling of single-cell multi-omic data with totalVI. Nature methods, 18(3):272–282, 2021. Publisher: Nature Publishing Group.
- Multigrate: single-cell multi-omic data integration. bioRxiv, 2022. Publisher: Cold Spring Harbor Laboratory.
- Integrated analysis of multimodal single-cell data with structural similarity. Nucleic Acids Research, 50(21):e121–e121, 2022. Publisher: Oxford University Press.
- CVQVAE: A representation learning based method for multi-omics single cell data integration. In Proceedings of the 17th Machine Learning in Computational Biology meeting, pages 1–15. PMLR, December 2022. URL https://proceedings.mlr.press/v200/liu22a.html. ISSN: 2640-3498.
- Multimodal generative models for scalable weakly-supervised learning. Advances in Neural Information Processing Systems, 31, 2018.
- Variational mixture-of-experts autoencoders for multi-modal deep generative models. Advances in Neural Information Processing Systems, 32, 2019.
- Changhee Lee and Mihaela van der Schaar. A Variational Information Bottleneck Approach to Multi-Omics Data Integration. In Arindam Banerjee and Kenji Fukumizu, editors, The 24th International Conference on Artificial Intelligence and Statistics, AISTATS 2021, April 13-15, 2021, Virtual Event, volume 130 of Proceedings of Machine Learning Research, pages 1513–1521. PMLR, 2021. URL http://proceedings.mlr.press/v130/lee21a.html.
- Generalized Multimodal ELBO. In International Conference on Learning Representations, January 2021. URL https://openreview.net/forum?id=5Y21V0RDBV.
- Private-shared disentangled multimodal vae for learning of latent representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1692–1700, 2021.
- MMVAE+: Enhancing the Generative Quality of Multimodal VAEs without Compromises. In ICLR Workshop on Deep Generative Models for Highly Structured Data, 2022.
- Multi-Omics Factor Analysis—a framework for unsupervised integration of multi-omics data sets. Molecular systems biology, 14(6):e8124, 2018.
- Joint analysis of expression levels and histological images identifies genes associated with tissue morphology. Nature communications, 12(1):1–12, 2021. Publisher: Nature Publishing Group.
- Variational inference: A review for statisticians. Journal of the American Statistical Association, 112(518):859–877, 2017.
- Joint multimodal learning with deep generative models. arXiv preprint arXiv:1611.01891, 2016.
- Geoffrey E. Hinton. Training products of experts by minimizing contrastive divergence. Neural computation, 14(8):1771–1800, 2002. Publisher: MIT Press.
- Adaptive mixtures of local experts. Neural computation, 3(1):79–87, 1991. Publisher: MIT Press.
- Neil Lawrence. Probabilistic non-linear principal component analysis with Gaussian process latent variable models. Journal of machine learning research, 6(Nov):1783–1816, 2005.
- Drug-perturbation-based stratification of blood cancer. The Journal of clinical investigation, 128(1):427–445, 2018. Publisher: Am Soc Clin Investig.
- Gene expression profiling of chronic lymphocytic leukemia can discriminate cases with stable disease and mutated Ig genes from those with progressive disease and unmutated Ig genes. Leukemia, 19(11):2002–2005, November 2005. ISSN 1476-5551. doi: 10.1038/sj.leu.2403865. URL https://www.nature.com/articles/2403865. Number: 11 Publisher: Nature Publishing Group.
- The cancer genome atlas pan-cancer analysis project. Nature genetics, 45(10):1113, 2013.
- Kaspar Märtens (7 papers)
- Christopher Yau (19 papers)