Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

scCDCG: Efficient Deep Structural Clustering for single-cell RNA-seq via Deep Cut-informed Graph Embedding (2404.06167v1)

Published 9 Apr 2024 in cs.LG, cs.AI, and q-bio.GN

Abstract: Single-cell RNA sequencing (scRNA-seq) is essential for unraveling cellular heterogeneity and diversity, offering invaluable insights for bioinformatics advancements. Despite its potential, traditional clustering methods in scRNA-seq data analysis often neglect the structural information embedded in gene expression profiles, crucial for understanding cellular correlations and dependencies. Existing strategies, including graph neural networks, face challenges in handling the inefficiency due to scRNA-seq data's intrinsic high-dimension and high-sparsity. Addressing these limitations, we introduce scCDCG (single-cell RNA-seq Clustering via Deep Cut-informed Graph), a novel framework designed for efficient and accurate clustering of scRNA-seq data that simultaneously utilizes intercellular high-order structural information. scCDCG comprises three main components: (i) A graph embedding module utilizing deep cut-informed techniques, which effectively captures intercellular high-order structural information, overcoming the over-smoothing and inefficiency issues prevalent in prior graph neural network methods. (ii) A self-supervised learning module guided by optimal transport, tailored to accommodate the unique complexities of scRNA-seq data, specifically its high-dimension and high-sparsity. (iii) An autoencoder-based feature learning module that simplifies model complexity through effective dimension reduction and feature extraction. Our extensive experiments on 6 datasets demonstrate scCDCG's superior performance and efficiency compared to 7 established models, underscoring scCDCG's potential as a transformative tool in scRNA-seq data analysis. Our code is available at: https://github.com/XPgogogo/scCDCG.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. A single-cell transcriptomic map of the human and mouse pancreas reveals inter-and intra-cell population structure. Cell systems, 3(4):346–360, 2016.
  2. Structural deep clustering network. In Proceedings of the web conference 2020, pages 1400–1410, 2020.
  3. Comprehensive single-cell transcriptional profiling of a multicellular organism. Science, 357(6352):661–667, 2017.
  4. Measuring and relieving the over-smoothing problem for graph neural networks from the topological view. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 3438–3445, 2020.
  5. Single-cell rna-seq denoising using a deep count autoencoder. Nature communications, 10(1):390, 2019.
  6. Deep structural clustering for single-cell rna-seq data jointly through autoencoder and graph neural network. Briefings in Bioinformatics, 23(2):bbac018, 2022.
  7. Identification of cancer subtypes from single-cell rna-seq data using a consensus clustering method. BMC medical genomics, 11:65–72, 2018.
  8. Diffusion pseudotime robustly reconstructs lineage branching. Nature methods, 13(10):845–848, 2016.
  9. Mapping the mouse cell atlas by microwell-seq. Cell, 172(5):1091–1107, 2018.
  10. G Hinton and L van der Maaten. Visualizing data using t-sne journal of machine learning research. 2008.
  11. Reducing the dimensionality of data with neural networks. science, 313(5786):504–507, 2006.
  12. Challenges in unsupervised clustering of single-cell rna-seq data. Nature Reviews Genetics, 20(5):273–282, 2019.
  13. A topology-preserving dimensionality reduction method for single-cell rna-seq data using graph autoencoder. Scientific reports, 11(1):20028, 2021.
  14. Single cell rna sequencing of human liver reveals distinct intrahepatic macrophage populations. Nature communications, 9(1):4383, 2018.
  15. Gaspard Monge. Mémoire sur la théorie des déblais et des remblais. Mem. Math. Phys. Acad. Royale Sci., pages 666–704, 1781.
  16. A single-cell transcriptome atlas of the human pancreas. Cell systems, 3(4):385–394, 2016.
  17. Graph soft-contrastive learning via neighborhood ranking. arXiv preprint arXiv:2209.13964, 2022.
  18. Single-cell sequencing-based technologies will revolutionize whole-organism science. Nature Reviews Genetics, 14(9):618–630, 2013.
  19. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):888–905, 2000.
  20. Richard Sinkhorn. Diagonal equivalence to matrices with prescribed row and column sums. The American Mathematical Monthly, 74(4):402–405, 1967.
  21. Cluster ensembles—a knowledge reuse framework for combining multiple partitions. Journal of machine learning research, 3(Dec):583–617, 2002.
  22. Clustering single-cell rna-seq data with a model-based deep learning approach. Nature Machine Intelligence, 1(4):191–198, 2019.
  23. Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
  24. Information theoretic measures for clusterings comparison: is a correction for chance necessary? In Proceedings of the 26th annual international conference on machine learning, pages 1073–1080, 2009.
  25. scname: neighborhood contrastive clustering with ancillary mask estimation for scrna-seq data. Bioinformatics, 38(6):1575–1583, 2022.
  26. Suscc: secondary construction of feature space based on umap for rapid and accurate clustering large-scale single cell rna-seq data. Interdisciplinary Sciences: Computational Life Sciences, 13:83–90, 2021.
  27. scgnn is a novel graph neural network framework for single-cell rna-seq analyses. Nature communications, 12(1):1882, 2021.
  28. Deep adaptive graph clustering via von mises-fisher distributions. ACM Transactions on the Web, 18(2):1–21, 2024.
  29. Empirical analysis of performance bottlenecks in graph neural network training and inference with gpus. Neurocomputing, 446:165–191, 2021.
  30. Unsupervised deep embedding for clustering analysis. In International conference on machine learning, pages 478–487. PMLR, 2016.
  31. Accurately clustering single-cell rna-seq data by capturing structural relations between cells through graph convolutional network. In 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 519–522. IEEE, 2020.
  32. Graph neural networks with heterophily. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 11168–11176, 2021.
  33. pcareduce: hierarchical clustering of single cell transcriptional profiles. BMC bioinformatics, 17:1–11, 2016.

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com