The VampPrior Mixture Model (2402.04412v2)
Abstract: Current clustering priors for deep latent variable models (DLVMs) require defining the number of clusters a-priori and are susceptible to poor initializations. Addressing these deficiencies could greatly benefit deep learning-based scRNA-seq analysis by performing integration and clustering simultaneously. We adapt the VampPrior (Tomczak & Welling, 2018) into a Dirichlet process Gaussian mixture model, resulting in the VampPrior Mixture Model (VMM), a novel prior for DLVMs. We propose an inference procedure that alternates between variational inference and Empirical Bayes to cleanly distinguish variational and prior parameters. Using the VMM in a Variational Autoencoder attains highly competitive clustering performance on benchmark datasets. Augmenting scVI (Lopez et al., 2018), a popular scRNA-seq integration method, with the VMM significantly improves its performance and automatically arranges cells into biologically meaningful clusters.
- Minimum-distortion embedding. Foundations and Trends® in Machine Learning, 14(3):211–378, 2021. ISSN 1935-8237. Publisher: Now Publishers, Inc.
- Fixing a broken ELBO. pp. 159–168. PMLR, 2018. ISBN 2640-3498.
- Importance Weighted Autoencoders, November 2016. URL http://arxiv.org/abs/1509.00519. arXiv:1509.00519 [cs, stat].
- The specious art of single-cell genomics. PLOS Computational Biology, 19(8):e1011288, August 2023. ISSN 1553-7358. doi: 10.1371/journal.pcbi.1011288. URL https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1011288. Publisher: Public Library of Science.
- Deep unsupervised clustering with gaussian mixture variational autoencoders. arXiv preprint arXiv:1611.02648, 2016.
- Edward, R. C. The infinite gaussian mixture model. Advances in neural information processing systems, pp. 554–560, 2000.
- Bayesian regularization for normal mixture estimation and model-based clustering. Journal of classification, 24(2):155–181, 2007. ISSN 0176-4268. Publisher: Springer.
- Elbo surgery: yet another way to carve up the variational evidence lower bound. volume 1, 2016. Issue: 2.
- Approximate Dirichlet Process Computing in Finite Normal Mixtures: Smoothing and Prior Information. Journal of Computational and Graphical Statistics, 11(3):508–532, September 2002. ISSN 1061-8600, 1537-2715. doi: 10.1198/106186002411. URL https://www.tandfonline.com/doi/full/10.1198/106186002411.
- Variational Deep Embedding: An Unsupervised and Generative Approach to Clustering. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI 2017, Melbourne, Australia, August 19-25, 2017, pp. 1965–1972, 2017. doi: 10.24963/IJCAI.2017/273. URL https://doi.org/10.24963/ijcai.2017/273.
- Composing graphical models with neural networks for structured representations and fast inference. Advances in neural information processing systems, 29, 2016.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
- Fast, sensitive and accurate integration of single-cell data with Harmony. Nature Methods, 16(12):1289–1296, December 2019. ISSN 1548-7091, 1548-7105. doi: 10.1038/s41592-019-0619-0. URL http://www.nature.com/articles/s41592-019-0619-0.
- Deep generative modeling for single-cell transcriptomics. Nature methods, 15(12):1053–1058, 2018. ISSN 1548-7091. Publisher: Nature Publishing Group US New York.
- Benchmarking atlas-level data integration in single-cell genomics. Nature methods, 19(1):41–50, 2022. ISSN 1548-7091. Publisher: Nature Publishing Group US New York.
- Eleven grand challenges in single-cell data science. Genome biology, 21(1):1–35, 2020. ISSN 1474-760X. Publisher: BioMed Central.
- Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426, 2018.
- Approximate inference for deep latent gaussian mixtures. volume 2, pp. 131, 2016.
- Stick-Breaking Variational Autoencoders. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, 2017. URL https://openreview.net/forum?id=S1jmAotxg.
- Black box variational inference. pp. 814–822. PMLR, 2014.
- Stochastic backpropagation and approximate inference in deep generative models. pp. 1278–1286. PMLR, 2014.
- Absence of microglia promotes diverse pathologies and early lethality in Alzheimer’s disease mice. Cell reports, 39(11):110961, June 2022. ISSN 2211-1247. doi: 10.1016/j.celrep.2022.110961. URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9285116/.
- A new distribution on the simplex with auto-encoding applications. Advances in Neural Information Processing Systems, 32, 2019.
- Comprehensive Integration of Single-Cell Data. Cell, 177(7):1888–1902.e21, June 2019. ISSN 00928674. doi: 10.1016/j.cell.2019.05.031. URL https://linkinghub.elsevier.com/retrieve/pii/S0092867419305598.
- Interpretable factor models of single-cell RNA-seq via variational autoencoders. Bioinformatics, 36(11):3418–3421, 2020. ISSN 1367-4803. Publisher: Oxford University Press.
- VAE with a VampPrior. pp. 1214–1223. PMLR, 2018. ISBN 2640-3498.
- Visualizing data using t-SNE. Journal of machine learning research, 9(11), 2008. ISSN 1532-4435.
- Williams, R. J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Reinforcement learning, pp. 5–32, 1992. ISSN 1461366089. Publisher: Springer.
- Probabilistic harmonization and annotation of single‐cell transcriptomics data with deep generative models. Molecular systems biology, 17(1):e9620, 2021. ISSN 1744-4292.
- A Comprehensive Survey on Deep Clustering: Taxonomy, Challenges, and Future Directions, June 2022. URL http://arxiv.org/abs/2206.07579. arXiv:2206.07579 [cs].