Scalable Amortized GPLVMs for Single Cell Transcriptomics Data (2405.03879v1)
Abstract: Dimensionality reduction is crucial for analyzing large-scale single-cell RNA-seq data. Gaussian Process Latent Variable Models (GPLVMs) offer an interpretable dimensionality reduction method, but current scalable models lack effectiveness in clustering cell types. We introduce an improved model, the amortized stochastic variational Bayesian GPLVM (BGPLVM), tailored for single-cell RNA-seq with specialized encoder, kernel, and likelihood designs. This model matches the performance of the leading single-cell variational inference (scVI) approach on synthetic and real-world COVID datasets and effectively incorporates cell-cycle and batch information to reveal more interpretable latent structures as we demonstrate on an innate immunity dataset.
- Grandprix: scaling up the bayesian gplvm for single-cell data. Bioinformatics, 35(1):47–54, 2019.
- Computational analysis of cell-to-cell heterogeneity in single-cell rna-sequencing data reveals hidden subpopulations of cells. Nature biotechnology, 33(2):155–160, 2015.
- Bayesian gaussian process latent variable models for pseudotime inference in single-cell rna-seq data. bioRxiv, pp. 026872, 2015.
- The single-cell transcriptional landscape of mammalian organogenesis. Nature, 566(7745):496–502, 2019.
- Low dimensionality in gene expression data enables the accurate extraction of transcriptional programs from shallow sequencing. Cell systems, 2(4):239–250, 2016.
- Gaussian processes for big data. arXiv preprint arXiv:1309.6835, 2013.
- Computational methods for single-cell rna sequencing. Annual Review of Biomedical Data Science, 3:339–364, 2020.
- Stochastic variational inference. Journal of Machine Learning Research, 2013.
- Identifying gene expression programs of cell-type identity and cellular activity with single-cell rna-seq. Elife, 8:e43803, 2019.
- Mapping interindividual dynamics of innate immune response at single-cell resolution. bioRxiv, pp. 2021–09, 2021.
- Modelling technical and biological effects in scrna-seq data with scalable gplvms. arXiv preprint arXiv:2209.06716, 2022a.
- Generalised gplvm with stochastic variational inference. In International Conference on Artificial Intelligence and Statistics, pp. 7841–7864. PMLR, 2022b.
- Neil D Lawrence. Gaussian process models for visualisation of high dimensional data. Advances in Neural Information Processing Systems, 2004.
- Deep generative modeling for single-cell transcriptomics. Nature methods, 15(12):1053–1058, 2018.
- Current best practices in single-cell rna-seq analysis: a tutorial. Molecular systems biology, 15(6):e8746, 2019.
- Benchmarking atlas-level data integration in single-cell genomics. Nature methods, 19(1):41–50, 2022.
- Pooling across cells to normalize single-cell rna sequencing data with many zero counts. Genome Biology, 17(75), 2016. doi: https://doi.org/10.1186/s13059-016-0947-7.
- Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426, 2018.
- A revised airway epithelial hierarchy includes cftr-expressing ionocytes. Nature, 560(7718):319–324, 2018.
- A single-cell atlas of the airway epithelium reveals the cftr-rich pulmonary ionocyte. Nature, 560(7718):377–381, 2018.
- Single-cell multi-omics analysis of the immune response in covid-19. Nature medicine, 27(5):904–916, 2021.
- Exponential scaling of single-cell rna-seq in the past decade. Nature protocols, 13(4):599–604, 2018.
- Interpretable factor models of single-cell rna-seq via variational autoencoders. Bioinformatics, 36(11):3418–3421, 2020.
- Scaling single-cell genomics from phenomenology to mechanism. Nature, 541(7637):331–338, 2017.
- From louvain to leiden: guaranteeing well-connected communities. Scientific reports, 9(1):5233, 2019.
- A robust nonlinear low-dimensional manifold for single cell rna-seq data. BMC bioinformatics, 21(1):1–15, 2020.
- Scanpy: large-scale single-cell gene expression data analysis. Genome biology, 19:1–5, 2018.
- Splatter: simulation of single-cell rna sequencing data. Genome biology, 18(1):174, 2017.