Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
157 tokens/sec
GPT-4o
43 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The VampPrior Mixture Model (2402.04412v2)

Published 6 Feb 2024 in cs.LG, cs.AI, and stat.ML

Abstract: Current clustering priors for deep latent variable models (DLVMs) require defining the number of clusters a-priori and are susceptible to poor initializations. Addressing these deficiencies could greatly benefit deep learning-based scRNA-seq analysis by performing integration and clustering simultaneously. We adapt the VampPrior (Tomczak & Welling, 2018) into a Dirichlet process Gaussian mixture model, resulting in the VampPrior Mixture Model (VMM), a novel prior for DLVMs. We propose an inference procedure that alternates between variational inference and Empirical Bayes to cleanly distinguish variational and prior parameters. Using the VMM in a Variational Autoencoder attains highly competitive clustering performance on benchmark datasets. Augmenting scVI (Lopez et al., 2018), a popular scRNA-seq integration method, with the VMM significantly improves its performance and automatically arranges cells into biologically meaningful clusters.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. Minimum-distortion embedding. Foundations and Trends® in Machine Learning, 14(3):211–378, 2021. ISSN 1935-8237. Publisher: Now Publishers, Inc.
  2. Fixing a broken ELBO. pp.  159–168. PMLR, 2018. ISBN 2640-3498.
  3. Importance Weighted Autoencoders, November 2016. URL http://arxiv.org/abs/1509.00519. arXiv:1509.00519 [cs, stat].
  4. The specious art of single-cell genomics. PLOS Computational Biology, 19(8):e1011288, August 2023. ISSN 1553-7358. doi: 10.1371/journal.pcbi.1011288. URL https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1011288. Publisher: Public Library of Science.
  5. Deep unsupervised clustering with gaussian mixture variational autoencoders. arXiv preprint arXiv:1611.02648, 2016.
  6. Edward, R. C. The infinite gaussian mixture model. Advances in neural information processing systems, pp. 554–560, 2000.
  7. Bayesian regularization for normal mixture estimation and model-based clustering. Journal of classification, 24(2):155–181, 2007. ISSN 0176-4268. Publisher: Springer.
  8. Elbo surgery: yet another way to carve up the variational evidence lower bound. volume 1, 2016. Issue: 2.
  9. Approximate Dirichlet Process Computing in Finite Normal Mixtures: Smoothing and Prior Information. Journal of Computational and Graphical Statistics, 11(3):508–532, September 2002. ISSN 1061-8600, 1537-2715. doi: 10.1198/106186002411. URL https://www.tandfonline.com/doi/full/10.1198/106186002411.
  10. Variational Deep Embedding: An Unsupervised and Generative Approach to Clustering. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI 2017, Melbourne, Australia, August 19-25, 2017, pp.  1965–1972, 2017. doi: 10.24963/IJCAI.2017/273. URL https://doi.org/10.24963/ijcai.2017/273.
  11. Composing graphical models with neural networks for structured representations and fast inference. Advances in neural information processing systems, 29, 2016.
  12. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  13. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
  14. Fast, sensitive and accurate integration of single-cell data with Harmony. Nature Methods, 16(12):1289–1296, December 2019. ISSN 1548-7091, 1548-7105. doi: 10.1038/s41592-019-0619-0. URL http://www.nature.com/articles/s41592-019-0619-0.
  15. Deep generative modeling for single-cell transcriptomics. Nature methods, 15(12):1053–1058, 2018. ISSN 1548-7091. Publisher: Nature Publishing Group US New York.
  16. Benchmarking atlas-level data integration in single-cell genomics. Nature methods, 19(1):41–50, 2022. ISSN 1548-7091. Publisher: Nature Publishing Group US New York.
  17. Eleven grand challenges in single-cell data science. Genome biology, 21(1):1–35, 2020. ISSN 1474-760X. Publisher: BioMed Central.
  18. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426, 2018.
  19. Approximate inference for deep latent gaussian mixtures. volume 2, pp.  131, 2016.
  20. Stick-Breaking Variational Autoencoders. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, 2017. URL https://openreview.net/forum?id=S1jmAotxg.
  21. Black box variational inference. pp.  814–822. PMLR, 2014.
  22. Stochastic backpropagation and approximate inference in deep generative models. pp.  1278–1286. PMLR, 2014.
  23. Absence of microglia promotes diverse pathologies and early lethality in Alzheimer’s disease mice. Cell reports, 39(11):110961, June 2022. ISSN 2211-1247. doi: 10.1016/j.celrep.2022.110961. URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9285116/.
  24. A new distribution on the simplex with auto-encoding applications. Advances in Neural Information Processing Systems, 32, 2019.
  25. Comprehensive Integration of Single-Cell Data. Cell, 177(7):1888–1902.e21, June 2019. ISSN 00928674. doi: 10.1016/j.cell.2019.05.031. URL https://linkinghub.elsevier.com/retrieve/pii/S0092867419305598.
  26. Interpretable factor models of single-cell RNA-seq via variational autoencoders. Bioinformatics, 36(11):3418–3421, 2020. ISSN 1367-4803. Publisher: Oxford University Press.
  27. VAE with a VampPrior. pp.  1214–1223. PMLR, 2018. ISBN 2640-3498.
  28. Visualizing data using t-SNE. Journal of machine learning research, 9(11), 2008. ISSN 1532-4435.
  29. Williams, R. J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Reinforcement learning, pp.  5–32, 1992. ISSN 1461366089. Publisher: Springer.
  30. Probabilistic harmonization and annotation of single‐cell transcriptomics data with deep generative models. Molecular systems biology, 17(1):e9620, 2021. ISSN 1744-4292.
  31. A Comprehensive Survey on Deep Clustering: Taxonomy, Challenges, and Future Directions, June 2022. URL http://arxiv.org/abs/2206.07579. arXiv:2206.07579 [cs].

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com