Papers
Topics
Authors
Recent
Search
2000 character limit reached

K-Nearest-Neighbors Induced Topological PCA for scRNA Sequence Data Analysis

Published 23 Oct 2023 in q-bio.QM, cs.LG, and math.AT | (2310.14521v1)

Abstract: Single-cell RNA sequencing (scRNA-seq) is widely used to reveal heterogeneity in cells, which has given us insights into cell-cell communication, cell differentiation, and differential gene expression. However, analyzing scRNA-seq data is a challenge due to sparsity and the large number of genes involved. Therefore, dimensionality reduction and feature selection are important for removing spurious signals and enhancing downstream analysis. Traditional PCA, a main workhorse in dimensionality reduction, lacks the ability to capture geometrical structure information embedded in the data, and previous graph Laplacian regularizations are limited by the analysis of only a single scale. We propose a topological Principal Components Analysis (tPCA) method by the combination of persistent Laplacian (PL) technique and L$_{2,1}$ norm regularization to address multiscale and multiclass heterogeneity issues in data. We further introduce a k-Nearest-Neighbor (kNN) persistent Laplacian technique to improve the robustness of our persistent Laplacian method. The proposed kNN-PL is a new algebraic topology technique which addresses the many limitations of the traditional persistent homology. Rather than inducing filtration via the varying of a distance threshold, we introduced kNN-tPCA, where filtrations are achieved by varying the number of neighbors in a kNN network at each step, and find that this framework has significant implications for hyper-parameter tuning. We validate the efficacy of our proposed tPCA and kNN-tPCA methods on 11 diverse benchmark scRNA-seq datasets, and showcase that our methods outperform other unsupervised PCA enhancements from the literature, as well as popular Uniform Manifold Approximation (UMAP), t-Distributed Stochastic Neighbor Embedding (tSNE), and Projection Non-Negative Matrix Factorization (NMF) by significant margins.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. A step-by-step workflow for low-level analysis of single-cell rna-seq data with bioconductor. 2016.
  2. Peter V Kharchenko. The triumphs and limitations of computational methods for scrna-seq. Nature Methods, 18(7):723–732, 2021.
  3. Current best practices in single-cell rna-seq analysis: a tutorial. Molecular systems biology, 15(6):e8746, 2019.
  4. Single-cell rna-seq technologies and related computational data analysis. Frontiers in genetics, page 317, 2019.
  5. Machine learning and statistical methods for clustering single-cell rna-sequencing data. Briefings in bioinformatics, 21(4):1209–1223, 2020.
  6. A statistical simulator scdesign for rational scrna-seq experimental design. Bioinformatics, 35(14):i41–i50, 2019.
  7. Tutorial: guidelines for the computational analysis of single-cell rna sequencing data. Nature protocols, 16(1):1–9, 2021.
  8. Eleven grand challenges in single-cell data science. Genome biology, 21(1):1–35, 2020.
  9. Deep learning tackles single-cell analysis—a survey of deep learning for scrna-seq analysis. Briefings in bioinformatics, 23(1):bbab531, 2022.
  10. Statistics or biology: the zero-inflation controversy about scrna-seq data. Genome biology, 23(1):1–24, 2022.
  11. Sinnlrr: a robust subspace clustering method for cell type detection by non-negative and low-rank representation. Bioinformatics, 35(19):3642–3650, 2019.
  12. Visualization and analysis of single-cell rna-seq data by kernel-based similarity learning. Nature methods, 14(4):414–416, 2017.
  13. Deep learning tackles single-cell analysis a survey of deep learning for scrna-seq analysis, 2021.
  14. Scdrha: A scrna-seq data dimensionality reduction algorithm based on hierarchical autoencoder. Frontiers in Genetics, 12, 2021.
  15. Interpretable dimensionality reduction of single cell transcriptome data with deep generative models. Nature Communications, 9, 05 2018.
  16. A topology-preserving dimensionality reduction method for single-cell rna-seq data using graph autoencoder. Scientific Reports, 11:20028, 10 2021.
  17. Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of Machine Learning Research, 9(86):2579–2605, 2008.
  18. Umap: Uniform manifold approximation and projection for dimension reduction, 2020.
  19. Computational approaches for interpreting scrna-seq data. FEBS letters, 591, 05 2017.
  20. Sparse principal component analysis via joint l 2,1-norm penalty. volume 8272, pages 148–159, 12 2013.
  21. Ccp: Correlated clustering and projection for dimensionality reduction. arXiv preprint arXiv:2206.04189, 2022.
  22. Robust graph regularized nmf with dissimilarity and similarity constraints for scrna-seq data clustering. Journal of Chemical Information and Modeling, 62(23):6271–6286, 2022. PMID: 36459053.
  23. Robust classification of single-cell transcriptome data by nonnegative matrix factorization. Bioinformatics, 33, 09 2016.
  24. I. Jolliffe. Principal component analysis. Encyclopedia of statistics in behavioral science, 2005.
  25. Non-greedy l21-norm maximization for principal component analysis, 2016.
  26. Graph-laplacian pca: Closed-form solution and robustness. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3492–3498, 2013.
  27. Plpca: Persistent laplacian enhanced-pca for microarray data analysis, 2023.
  28. Persistent spectral graph. International journal for numerical methods in biomedical engineering, 36(9):e3376, 2020.
  29. Persistent laplacians: Properties, algorithms and implications. SIAM Journal on Mathematics of Data Science, 4(2):858–884, 2022.
  30. Persistent sheaf laplacians. arXiv preprint arXiv:2112.10906, 2021.
  31. The algebraic stability for persistent laplacians. arXiv preprint arXiv:2302.03902, 2023.
  32. Persistent hyperdigraph homology and persistent hyperdigraph laplacians. arXiv preprint arXiv:2304.00345, 2023.
  33. Persistent laplacian projected omicron ba. 4 and ba. 5 to become new dominating variants. Computers in Biology and Medicine, 151:106262, 2022.
  34. Persistent spectral theory-guided protein engineering. Nature Computational Science, 3(2):149–163, 2023.
  35. Persistent spectral–based machine learning (perspect ml) for protein-ligand binding affinity prediction. Science advances, 7(19):eabc5329, 2021.
  36. I. Jolliffe and J. Cadima. Principal component analysis: a review and recent developments. Philosophical transactions of the royal society A: Mathematical, Physical and Engineering Sciences, 374(2065):20150202, 2016.
  37. M. Belkin and P. Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. Advances in neural information processing systems, 14, 2001.
  38. Evolutionary de rham-hodge method. Discrete and continuous dynamical systems. Series B, 26(7):3785, 2021.
  39. Hermes: Persistent spectral graph software. Foundations of data science (Springfield, Mo.), 3(1):67, 2021.
  40. Persistent homology with k-nearest-neighbor filtrations reveals topological convergence of pagerank, 2022.
  41. A survey of human brain transcriptome diversity at the single cell level. Proceedings of the National Academy of Sciences of the United States of America, 112:7285 – 7290, 2015.
  42. Single-cell rna-seq reveals novel regulators of human embryonic stem cell differentiation to definitive endoderm. Genome Biology, 17, 08 2016.
  43. Cellular taxonomy of the mouse striatum as revealed by single-cell rna-seq. Cell reports, 16 4:1126–1137, 2016.
  44. Single-cell rna-seq reveals new types of human blood dendritic cells, monocytes, and progenitors. Science, 356(6335):eaah4573, 2017.
  45. A single-cell transcriptomic map of the human and mouse pancreas reveals inter- and intra-cell population structure. Cell systems, 3:346–360, 10 2016.
  46. Single-cell rna-seq reveals dynamic, random monoallelic gene expression in mammalian cells. Science, 343(6167):193–196, 2014.
Citations (2)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.