Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Analyzing scRNA-seq data by CCP-assisted UMAP and t-SNE (2306.13750v1)

Published 23 Jun 2023 in cs.LG

Abstract: Single-cell RNA sequencing (scRNA-seq) is widely used to reveal heterogeneity in cells, which has given us insights into cell-cell communication, cell differentiation, and differential gene expression. However, analyzing scRNA-seq data is a challenge due to sparsity and the large number of genes involved. Therefore, dimensionality reduction and feature selection are important for removing spurious signals and enhancing downstream analysis. Correlated clustering and projection (CCP) was recently introduced as an effective method for preprocessing scRNA-seq data. CCP utilizes gene-gene correlations to partition the genes and, based on the partition, employs cell-cell interactions to obtain super-genes. Because CCP is a data-domain approach that does not require matrix diagonalization, it can be used in many downstream machine learning tasks. In this work, we utilize CCP as an initialization tool for uniform manifold approximation and projection (UMAP) and t-distributed stochastic neighbor embedding (t-SNE). By using eight publicly available datasets, we have found that CCP significantly improves UMAP and t-SNE visualization and dramatically improve their accuracy.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. A Step-by-Step Workflow for Low-level Analysis of Single-Cell RNA-Seq Data With Bioconductor. 2016.
  2. Single-Cell RNA Sequencing Technologies and Bioinformatics Pipelines. Experimental & molecular medicine, 50(8):1–14, 2018.
  3. Tutorial: Guidelines for the Computational Analysis of Single-Cell RNA Sequencing Data. Nature protocols, 16(1):1–9, 2021.
  4. Current Best Practices in Single-Cell RNA-Seq Analysis: A Tutorial. Molecular systems biology, 15(6):e8746, 2019.
  5. Single-Cell RNA-Seq Technologies and Related Computational Data Analysis. Frontiers in genetics, page 317, 2019.
  6. Machine Learning and Statistical Methods for Clustering Single-Cell RNA-Sequencing Data. Briefings in bioinformatics, 21(4):1209–1223, 2020.
  7. A Statistical Simulator Scdesign for Rational ScRNA-seq Experimental Design. Bioinformatics, 35(14):i41–i50, 2019.
  8. Pca Outperforms Popular Hidden Variable Inference Methods for Molecular QTL Mapping. Genome Biology, 23(1):1–17, 2022.
  9. Laurens Van der Maaten and Geoffrey Hinton. Visualizing Data Using T-SNE. Journal of machine learning research, 9(11), 2008.
  10. The Art of Using T-SNE for Single-Cell Transcriptomics. Nature communications, 10(1):1–14, 2019.
  11. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv preprint arXiv:1802.03426, 2018.
  12. UMAP-Assisted K-Means Clustering of Large-Scale SARS-Cov-2 Mutation Datasets. Computers in biology and medicine, 131:104264, 2021.
  13. Initialization Is Critical for Preserving Global Data Structure in Both T-SNE and UMAP. Nature biotechnology, 39(2):156–157, 2021.
  14. Robust Graph Regularized Nmf With Dissimilarity and Similarity Constraints for ScRNA-Seq Data Clustering. Journal of Chemical Information and Modeling, 62(23):6271–6286, 2022.
  15. A Robust Semi-supervised Nmf Model for Single Cell RNA-Seq Data. PeerJ, 8:e10091, 2020.
  16. Jianwei Chen. Detecting Cell Type From Single Cell RNA Sequencing Based on Deep Bi-stochastic Graph Regularized Matrix Factorization. bioRxiv, 2022.
  17. A Graph Regularized Non-negative Matrix Factorization Method for Identifying Microrna-Disease Associations. Bioinformatics, 34(2):239–248, 2018.
  18. Robust Hypergraph Regularized Non-negative Matrix Factorization for Sample Clustering and Feature Selection in Multi-View Gene Expression Data. Human genomics, 13(1):1–10, 2019.
  19. A Joint-l2, 1-Norm-Constraint-Based Semi-supervised Feature Extraction for RNA-Seq Data Analysis. Neurocomputing, 228:263–269, 2017.
  20. SINLRR: A Robust Subspace Clustering Method for Cell Type Detection by Non-negative and Low-Rank Representation. Bioinformatics, 35(19):3642–3650, 2019.
  21. Visualization and Analysis of Single-Cell RNA-Seq Data by Kernel-Based Similarity Learning. Nature methods, 14(4):414–416, 2017.
  22. Deep Generative Modeling for Single-Cell Transcriptomics. Nature methods, 15(12):1053–1058, 2018.
  23. DigitalDLsorter: Deep-Learning on Scrna-Seq to Deconvolute Gene Expression Data. Frontiers in Genetics, 10:978, 2019.
  24. ScMRA: A Robust Deep Learning Method to Annotate Scrna-Seq Data With Multiple Reference Datasets. Bioinformatics, 38(3):738–745, 2022.
  25. A Topology-Preserving Dimensionality Reduction Method for Single-Cell RNA-Seq Data Using Graph Autoencoder. Scientific reports, 11(1):20028, 2021.
  26. VASC: Dimension Reduction and Visualization of Single-Cell RNA-Seq Data by Deep Variational Autoencoder. Genomics, proteomics & bioinformatics, 16(5):320–331, 2018.
  27. A Deep Adversarial Variational Autoencoder Model for Dimensionality Reduction in Single-Cell RNA Sequencing Analysis. BMC bioinformatics, 21(1):1–11, 2020.
  28. Eleven Grand Challenges in Single-Cell Data Science. Genome biology, 21(1):1–35, 2020.
  29. CCP: Correlated Clustering and Projection for Dimensionality Reduction. arXiv preprint arXiv:2206.04189, 2022.
  30. Multiscale Multiphysics and Multidomain Models—Flexibility and Rigidity. The Journal of chemical physics, 139(19):11B614_1, 2013.
  31. Preprocessing of Single Cell RNA Sequencing Data Using Correlated Clustering and Projection. Journal of chemical Information and Modeling, accepted, 2023.
  32. Gene Expression Omnibus: Ncbi Gene Expression and Hybridization Array Data Repository. Nucleic acids research, 30(1):207–210, 2002.
  33. Ncbi Geo: Archive for Functional Genomics Data Sets—Update. Nucleic acids research, 41(D1):D991–D995, 2012.
  34. Cell Fate Inclination Within 2-Cell and 4-Cell Mouse Embryos Revealed by Single-Cell RNA Sequencing. Genome research, 24(11):1787–1796, 2014.
  35. A Survey of Human Brain Transcriptome Diversity at the Single Cell Level. Proceedings of the National Academy of Sciences, 112(23):7285–7290, 2015.
  36. Single-Cell RNA-Seq Reveals Novel Regulators of Human Embryonic Stem Cell Differentiation to Definitive Endoderm. Genome biology, 17:1–20, 2016.
  37. Cellular Taxonomy of the Mouse Striatum As Revealed by Single-Cell RNA-Seq. Cell reports, 16(4):1126–1137, 2016.
  38. A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter–And Intra-cell Population Structure. Cell systems, 3(4):346–360, 2016.
  39. Single-Cell RNA-Seq Reveals New Types of Human Blood Dendritic Cells, Monocytes, and Progenitors. Science, 356(6335):eaah4573, 2017.
  40. William M Rand. Objective Criteria for the Evaluation of Clustering Methods. Journal of the American Statistical association, 66(336):846–850, 1971.
Citations (1)

Summary

We haven't generated a summary for this paper yet.