scRNA-seq Data Clustering by Cluster-aware Iterative Contrastive Learning (2312.16600v1)
Abstract: Single-cell RNA sequencing (scRNA-seq) enables researchers to analyze gene expression at single-cell level. One important task in scRNA-seq data analysis is unsupervised clustering, which helps identify distinct cell types, laying down the foundation for other downstream analysis tasks. In this paper, we propose a novel method called Cluster-aware Iterative Contrastive Learning (CICL in short) for scRNA-seq data clustering, which utilizes an iterative representation learning and clustering framework to progressively learn the clustering structure of scRNA-seq data with a cluster-aware contrastive loss. CICL consists of a Transformer encoder, a clustering head, a projection head and a contrastive loss module. First, CICL extracts the feature vectors of the original and augmented data by the Transformer encoder. Then, it computes the clustering centroids by K-means and employs the student t-distribution to assign pseudo-labels to all cells in the clustering head. The projection-head uses a Multi-Layer Perceptron (MLP) to obtain projections of the augmented data. At last, both pseudo-labels and projections are used in the contrastive loss to guide the model training. Such a process goes iteratively so that the clustering result becomes better and better. Extensive experiments on 25 real world scRNA-seq datasets show that CICL outperforms the SOTA methods. Concretely, CICL surpasses the existing methods by from 14% to 280%, and from 5% to 133% on average in terms of performance metrics ARI and NMI respectively.
- Psychrophilic proteases dramatically reduce single-cell rna-seq artifacts: a molecular atlas of kidney development. Development, 144(19):3625–3632, 2017.
- Differentiation dynamics of mammary epithelial cells revealed by single-cell rna sequencing. Nature communications, 8(1):1–11, 2017.
- Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment, 2008(10):P10008, 2008.
- A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020.
- Contrastive self-supervised clustering of scrna-seq data. BMC bioinformatics, 22(1):280, 2021.
- Tabula Muris Consortium. A single-cell transcriptomic atlas characterizes ageing tissues in the mouse. Nature, 583(7817):590–595, 2020.
- Single-cell rna-seq reveals dynamic, random monoallelic gene expression in mammalian cells. Science, 343(6167):193–196, 2014.
- Single-cell rna-sequencing analyses identify heterogeneity of cd8+ t cell subpopulations and novel therapy targets in melanoma. Molecular Therapy-Oncolytics, 20:105–118, 2021.
- Single-cell rna-seq denoising using a deep count autoencoder. Nature communications, 10(1):390, 2019.
- Deep learning tackles single-cell analysis—a survey of deep learning for scrna-seq analysis. Briefings in bioinformatics, 23(1):bbab531, 2022.
- Deep structural clustering for single-cell rna-seq data jointly through autoencoder and graph neural network. Briefings in Bioinformatics, 23(2):bbac018, 2022.
- Improved deep embedded clustering with local structure preservation. In Ijcai, volume 17, pages 1753–1759, 2017.
- Self-supervised contrastive learning for integrative single cell rna-seq data analysis. Briefings in Bioinformatics, 23(5):bbac377, 2022.
- Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738, 2020.
- Single-cell analysis of experience-dependent transcriptomic states in the mouse visual cortex. Nature neuroscience, 21(1):120–129, 2018.
- Iterative transfer learning with neural network for clustering and cell type classification in single-cell rna-seq analysis. Nature machine intelligence, 2(10):607–618, 2020.
- Challenges in unsupervised clustering of single-cell rna-seq data. Nature Reviews Genetics, 20(5):273–282, 2019.
- Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell, 161(5):1187–1201, 2015.
- Single cell rna-sequencing of pluripotent states unlocks modular transcriptional variation. Cell stem cell, 17(4):471–485, 2015.
- Deep generative modeling for single-cell transcriptomics. Nature methods, 15(12):1053–1058, 2018.
- A single-cell transcriptome atlas of the human pancreas. Cell systems, 3(4):385–394, 2016.
- Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
- A single-cell atlas of the airway epithelium reveals the cftr-rich pulmonary ionocyte. Nature, 560(7718):377–381, 2018.
- Low-coverage single-cell mrna sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex. Nature biotechnology, 32(10):1053–1058, 2014.
- Molecular interrogation of hypothalamic organization reveals distinct dopamine neuronal subtypes. Nature neuroscience, 20(2):176–188, 2017.
- Spatial reconstruction of single-cell gene expression data. Nature biotechnology, 33(5):495–502, 2015.
- Single-cell transcriptomics of 20 mouse organs creates a tabula muris: The tabula muris consortium. Nature, 562(7727):367, 2018.
- Computational and analytical challenges in single-cell transcriptomics. Nature Reviews Genetics, 16(3):133–145, 2015.
- mrna-seq whole-transcriptome analysis of a single cell. Nature methods, 6(5):377–382, 2009.
- Clustering single-cell rna-seq data with a model-based deep learning approach. Nature Machine Intelligence, 1(4):191–198, 2019.
- Evolution of pallium, hippocampus, and cortical cell types revealed by single-cell transcriptomics in reptiles. Science, 360(6391):881–888, 2018.
- Fast and precise single-cell data analysis using a hierarchical autoencoder. Nature communications, 12(1):1029, 2021.
- Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
- Single-cell rna-seq reveals aml hierarchies relevant to disease progression and immunity. Cell, 176(6):1265–1281, 2019.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- scname: neighborhood contrastive clustering with ancillary mask estimation for scrna-seq data. Bioinformatics, 38(6):1575–1583, 2022.
- Simlr: A tool for large-scale genomic analyses by multi-kernel learning. Proteomics, 18(2):1700232, 2018a.
- Pulmonary alveolar type i cell population consists of two distinct subtypes that differ in cell fate. Proceedings of the National Academy of Sciences, 115(10):2407–2412, 2018b.
- Scanpy: large-scale single-cell gene expression data analysis. Genome biology, 19:1–5, 2018.
- Unsupervised deep embedding for clustering analysis. In International conference on machine learning, pages 478–487. PMLR, 2016.
- Single-cell rna-seq profiling of human preimplantation embryos and embryonic stem cells. Nature structural & molecular biology, 20(9):1131–1139, 2013.
- Single-cell transcriptomes from human kidneys reveal the cellular identity of renal tumors. science, 361(6402):594–599, 2018.
- Accurately clustering single-cell rna-seq data by capturing structural relations between cells through graph convolutional network. In 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 519–522. IEEE, 2020.
- pcareduce: hierarchical clustering of single cell transcriptional profiles. BMC bioinformatics, 17:1–11, 2016.