Single-Cell RNA-seq Synthesis with Latent Diffusion Model (2312.14220v1)
Abstract: The single-cell RNA sequencing (scRNA-seq) technology enables researchers to study complex biological systems and diseases with high resolution. The central challenge is synthesizing enough scRNA-seq samples; insufficient samples can impede downstream analysis and reproducibility. While various methods have been attempted in past research, the resulting scRNA-seq samples were often of poor quality or limited in terms of useful specific cell subpopulations. To address these issues, we propose a novel method called Single-Cell Latent Diffusion (SCLD) based on the Diffusion Model. This method is capable of synthesizing large-scale, high-quality scRNA-seq samples, including both 'holistic' or targeted specific cellular subpopulations within a unified framework. A pre-guidance mechanism is designed for synthesizing specific cellular subpopulations, while a post-guidance mechanism aims to enhance the quality of scRNA-seq samples. The SCLD can synthesize large-scale and high-quality scRNA-seq samples for various downstream tasks. Our experimental results demonstrate state-of-the-art performance in cell classification and data distribution distances when evaluated on two scRNA-seq benchmarks. Additionally, visualization experiments show the SCLD's capability in synthesizing specific cellular subpopulations.
- The human cell atlas. Elife, 6, 2017.
- Dendritic cells and cytokines in human inflammatory and autoimmune diseases. Cytokine & growth factor reviews, 19(1):41–52, 2008.
- Breiman, L. Random forests. Machine learning, 45(1):5–32, 2001.
- Power failure: why small sample size undermines the reliability of neuroscience. Nature reviews neuroscience, 14(5):365–376, 2013.
- Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems, 34:8780–8794, 2021.
- Fawcett, T. An introduction to roc analysis. Pattern recognition letters, 27(8):861–874, 2006.
- Single-cell analysis in biotechnology, systems biology, and biocatalysis. Annual review of chemical and biomolecular engineering, 3:129–155, 2012.
- Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences. Atmospheric environment, 32(14-15):2627–2636, 1998.
- Generative adversarial networks. Communications of the ACM, 63(11):139–144, 2020.
- Initiation of tumor necrosis factor-α𝛼\alphaitalic_α antagonists and the risk of hospitalization for infection in patients with autoimmune diseases. Jama, 306(21):2331–2339, 2011.
- Mapping the mouse cell atlas by microwell-seq. Cell, 172(5):1091–1107, 2018.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
- ACTIVA: realistic single-cell RNA-seq generation with automatic cell-type identification using introspective variational autoencoders. Bioinformatics, 38(8):2194–2201, 02 2022. ISSN 1367-4803.
- Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
- Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
- Using auc and accuracy in evaluating learning algorithms. IEEE Transactions on knowledge and Data Engineering, 17(3):299–310, 2005.
- Ibragimov, I. A. On the composition of unimodal distributions. Theory of Probability & Its Applications, 1(2):255–260, 1956.
- Geometry based data generation. Advances in Neural Information Processing Systems, 31, 2018.
- Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
- Are gans created equal? a large-scale study. Advances in neural information processing systems, 31, 2018.
- Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications, 11(1):1–12, 2020.
- A manifesto for reproducible science. Nature human behaviour, 1(1):1–9, 2017.
- Narkhede, S. Understanding auc-roc curve. Towards Data Science, 26(1):220–227, 2018.
- Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pp. 8162–8171. PMLR, 2021.
- Single-cell rna sequencing to explore immune cell heterogeneity. Nature Reviews Immunology, 18(1):35–45, 2018.
- Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
- Photorealistic text-to-image diffusion models with deep language understanding. arXiv preprint arXiv:2205.11487, 2022.
- Removal of batch effects using distribution-matching residual networks. Bioinformatics, 33(16):2539–2546, 2017.
- A survey on image data augmentation for deep learning. Journal of big data, 6(1):1–48, 2019.
- Inflammation and cancer. Annals of African medicine, 18(3):121, 2019.
- Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pp. 2256–2265. PMLR, 2015.
- Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020.
- Veegan: Reducing mode collapse in gans using implicit variational learning. Advances in neural information processing systems, 30, 2017.
- A note on the evaluation of generative models. arXiv preprint arXiv:1511.01844, 2015.
- Traag, V. A. Faster unfolding of communities: Speeding up the louvain algorithm. Physical Review E, 92(3):032801, 2015.
- Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
- scIGANs: single-cell RNA-seq imputation using generative adversarial networks. Nucleic Acids Research, 48(15):e85–e85, 06 2020.
- Splatter: simulation of single-cell rna sequencing data. Genome biology, 18(1):1–15, 2017.
- Understanding bag-of-words model: a statistical framework. International journal of machine learning and cybernetics, 1:43–52, 2010.
- Massively parallel digital transcriptional profiling of single cells. Nature communications, 8(1):1–12, 2017.
- Unet++: A nested u-net architecture for medical image segmentation. In Deep learning in medical image analysis and multimodal learning for clinical decision support, pp. 3–11. Springer, 2018.