Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

scRNA-seq Data Clustering by Cluster-aware Iterative Contrastive Learning (2312.16600v1)

Published 27 Dec 2023 in q-bio.GN, cs.AI, and cs.LG

Abstract: Single-cell RNA sequencing (scRNA-seq) enables researchers to analyze gene expression at single-cell level. One important task in scRNA-seq data analysis is unsupervised clustering, which helps identify distinct cell types, laying down the foundation for other downstream analysis tasks. In this paper, we propose a novel method called Cluster-aware Iterative Contrastive Learning (CICL in short) for scRNA-seq data clustering, which utilizes an iterative representation learning and clustering framework to progressively learn the clustering structure of scRNA-seq data with a cluster-aware contrastive loss. CICL consists of a Transformer encoder, a clustering head, a projection head and a contrastive loss module. First, CICL extracts the feature vectors of the original and augmented data by the Transformer encoder. Then, it computes the clustering centroids by K-means and employs the student t-distribution to assign pseudo-labels to all cells in the clustering head. The projection-head uses a Multi-Layer Perceptron (MLP) to obtain projections of the augmented data. At last, both pseudo-labels and projections are used in the contrastive loss to guide the model training. Such a process goes iteratively so that the clustering result becomes better and better. Extensive experiments on 25 real world scRNA-seq datasets show that CICL outperforms the SOTA methods. Concretely, CICL surpasses the existing methods by from 14% to 280%, and from 5% to 133% on average in terms of performance metrics ARI and NMI respectively.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Psychrophilic proteases dramatically reduce single-cell rna-seq artifacts: a molecular atlas of kidney development. Development, 144(19):3625–3632, 2017.
  2. Differentiation dynamics of mammary epithelial cells revealed by single-cell rna sequencing. Nature communications, 8(1):1–11, 2017.
  3. Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment, 2008(10):P10008, 2008.
  4. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020.
  5. Contrastive self-supervised clustering of scrna-seq data. BMC bioinformatics, 22(1):280, 2021.
  6. Tabula Muris Consortium. A single-cell transcriptomic atlas characterizes ageing tissues in the mouse. Nature, 583(7817):590–595, 2020.
  7. Single-cell rna-seq reveals dynamic, random monoallelic gene expression in mammalian cells. Science, 343(6167):193–196, 2014.
  8. Single-cell rna-sequencing analyses identify heterogeneity of cd8+ t cell subpopulations and novel therapy targets in melanoma. Molecular Therapy-Oncolytics, 20:105–118, 2021.
  9. Single-cell rna-seq denoising using a deep count autoencoder. Nature communications, 10(1):390, 2019.
  10. Deep learning tackles single-cell analysis—a survey of deep learning for scrna-seq analysis. Briefings in bioinformatics, 23(1):bbab531, 2022.
  11. Deep structural clustering for single-cell rna-seq data jointly through autoencoder and graph neural network. Briefings in Bioinformatics, 23(2):bbac018, 2022.
  12. Improved deep embedded clustering with local structure preservation. In Ijcai, volume 17, pages 1753–1759, 2017.
  13. Self-supervised contrastive learning for integrative single cell rna-seq data analysis. Briefings in Bioinformatics, 23(5):bbac377, 2022.
  14. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738, 2020.
  15. Single-cell analysis of experience-dependent transcriptomic states in the mouse visual cortex. Nature neuroscience, 21(1):120–129, 2018.
  16. Iterative transfer learning with neural network for clustering and cell type classification in single-cell rna-seq analysis. Nature machine intelligence, 2(10):607–618, 2020.
  17. Challenges in unsupervised clustering of single-cell rna-seq data. Nature Reviews Genetics, 20(5):273–282, 2019.
  18. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell, 161(5):1187–1201, 2015.
  19. Single cell rna-sequencing of pluripotent states unlocks modular transcriptional variation. Cell stem cell, 17(4):471–485, 2015.
  20. Deep generative modeling for single-cell transcriptomics. Nature methods, 15(12):1053–1058, 2018.
  21. A single-cell transcriptome atlas of the human pancreas. Cell systems, 3(4):385–394, 2016.
  22. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
  23. A single-cell atlas of the airway epithelium reveals the cftr-rich pulmonary ionocyte. Nature, 560(7718):377–381, 2018.
  24. Low-coverage single-cell mrna sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex. Nature biotechnology, 32(10):1053–1058, 2014.
  25. Molecular interrogation of hypothalamic organization reveals distinct dopamine neuronal subtypes. Nature neuroscience, 20(2):176–188, 2017.
  26. Spatial reconstruction of single-cell gene expression data. Nature biotechnology, 33(5):495–502, 2015.
  27. Single-cell transcriptomics of 20 mouse organs creates a tabula muris: The tabula muris consortium. Nature, 562(7727):367, 2018.
  28. Computational and analytical challenges in single-cell transcriptomics. Nature Reviews Genetics, 16(3):133–145, 2015.
  29. mrna-seq whole-transcriptome analysis of a single cell. Nature methods, 6(5):377–382, 2009.
  30. Clustering single-cell rna-seq data with a model-based deep learning approach. Nature Machine Intelligence, 1(4):191–198, 2019.
  31. Evolution of pallium, hippocampus, and cortical cell types revealed by single-cell transcriptomics in reptiles. Science, 360(6391):881–888, 2018.
  32. Fast and precise single-cell data analysis using a hierarchical autoencoder. Nature communications, 12(1):1029, 2021.
  33. Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
  34. Single-cell rna-seq reveals aml hierarchies relevant to disease progression and immunity. Cell, 176(6):1265–1281, 2019.
  35. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  36. scname: neighborhood contrastive clustering with ancillary mask estimation for scrna-seq data. Bioinformatics, 38(6):1575–1583, 2022.
  37. Simlr: A tool for large-scale genomic analyses by multi-kernel learning. Proteomics, 18(2):1700232, 2018a.
  38. Pulmonary alveolar type i cell population consists of two distinct subtypes that differ in cell fate. Proceedings of the National Academy of Sciences, 115(10):2407–2412, 2018b.
  39. Scanpy: large-scale single-cell gene expression data analysis. Genome biology, 19:1–5, 2018.
  40. Unsupervised deep embedding for clustering analysis. In International conference on machine learning, pages 478–487. PMLR, 2016.
  41. Single-cell rna-seq profiling of human preimplantation embryos and embryonic stem cells. Nature structural & molecular biology, 20(9):1131–1139, 2013.
  42. Single-cell transcriptomes from human kidneys reveal the cellular identity of renal tumors. science, 361(6402):594–599, 2018.
  43. Accurately clustering single-cell rna-seq data by capturing structural relations between cells through graph convolutional network. In 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 519–522. IEEE, 2020.
  44. pcareduce: hierarchical clustering of single cell transcriptional profiles. BMC bioinformatics, 17:1–11, 2016.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com