Single-Cell RNA-seq Synthesis with Latent Diffusion Model
Abstract: The single-cell RNA sequencing (scRNA-seq) technology enables researchers to study complex biological systems and diseases with high resolution. The central challenge is synthesizing enough scRNA-seq samples; insufficient samples can impede downstream analysis and reproducibility. While various methods have been attempted in past research, the resulting scRNA-seq samples were often of poor quality or limited in terms of useful specific cell subpopulations. To address these issues, we propose a novel method called Single-Cell Latent Diffusion (SCLD) based on the Diffusion Model. This method is capable of synthesizing large-scale, high-quality scRNA-seq samples, including both 'holistic' or targeted specific cellular subpopulations within a unified framework. A pre-guidance mechanism is designed for synthesizing specific cellular subpopulations, while a post-guidance mechanism aims to enhance the quality of scRNA-seq samples. The SCLD can synthesize large-scale and high-quality scRNA-seq samples for various downstream tasks. Our experimental results demonstrate state-of-the-art performance in cell classification and data distribution distances when evaluated on two scRNA-seq benchmarks. Additionally, visualization experiments show the SCLD's capability in synthesizing specific cellular subpopulations.
- The human cell atlas. Elife, 6, 2017.
- Dendritic cells and cytokines in human inflammatory and autoimmune diseases. Cytokine & growth factor reviews, 19(1):41–52, 2008.
- Breiman, L. Random forests. Machine learning, 45(1):5–32, 2001.
- Power failure: why small sample size undermines the reliability of neuroscience. Nature reviews neuroscience, 14(5):365–376, 2013.
- Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems, 34:8780–8794, 2021.
- Fawcett, T. An introduction to roc analysis. Pattern recognition letters, 27(8):861–874, 2006.
- Single-cell analysis in biotechnology, systems biology, and biocatalysis. Annual review of chemical and biomolecular engineering, 3:129–155, 2012.
- Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences. Atmospheric environment, 32(14-15):2627–2636, 1998.
- Generative adversarial networks. Communications of the ACM, 63(11):139–144, 2020.
- Initiation of tumor necrosis factor-α𝛼\alphaitalic_α antagonists and the risk of hospitalization for infection in patients with autoimmune diseases. Jama, 306(21):2331–2339, 2011.
- Mapping the mouse cell atlas by microwell-seq. Cell, 172(5):1091–1107, 2018.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
- ACTIVA: realistic single-cell RNA-seq generation with automatic cell-type identification using introspective variational autoencoders. Bioinformatics, 38(8):2194–2201, 02 2022. ISSN 1367-4803.
- Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
- Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
- Using auc and accuracy in evaluating learning algorithms. IEEE Transactions on knowledge and Data Engineering, 17(3):299–310, 2005.
- Ibragimov, I. A. On the composition of unimodal distributions. Theory of Probability & Its Applications, 1(2):255–260, 1956.
- Geometry based data generation. Advances in Neural Information Processing Systems, 31, 2018.
- Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
- Are gans created equal? a large-scale study. Advances in neural information processing systems, 31, 2018.
- Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications, 11(1):1–12, 2020.
- A manifesto for reproducible science. Nature human behaviour, 1(1):1–9, 2017.
- Narkhede, S. Understanding auc-roc curve. Towards Data Science, 26(1):220–227, 2018.
- Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pp. 8162–8171. PMLR, 2021.
- Single-cell rna sequencing to explore immune cell heterogeneity. Nature Reviews Immunology, 18(1):35–45, 2018.
- Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
- Photorealistic text-to-image diffusion models with deep language understanding. arXiv preprint arXiv:2205.11487, 2022.
- Removal of batch effects using distribution-matching residual networks. Bioinformatics, 33(16):2539–2546, 2017.
- A survey on image data augmentation for deep learning. Journal of big data, 6(1):1–48, 2019.
- Inflammation and cancer. Annals of African medicine, 18(3):121, 2019.
- Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pp. 2256–2265. PMLR, 2015.
- Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020.
- Veegan: Reducing mode collapse in gans using implicit variational learning. Advances in neural information processing systems, 30, 2017.
- A note on the evaluation of generative models. arXiv preprint arXiv:1511.01844, 2015.
- Traag, V. A. Faster unfolding of communities: Speeding up the louvain algorithm. Physical Review E, 92(3):032801, 2015.
- Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
- scIGANs: single-cell RNA-seq imputation using generative adversarial networks. Nucleic Acids Research, 48(15):e85–e85, 06 2020.
- Splatter: simulation of single-cell rna sequencing data. Genome biology, 18(1):1–15, 2017.
- Understanding bag-of-words model: a statistical framework. International journal of machine learning and cybernetics, 1:43–52, 2010.
- Massively parallel digital transcriptional profiling of single cells. Nature communications, 8(1):1–12, 2017.
- Unet++: A nested u-net architecture for medical image segmentation. In Deep learning in medical image analysis and multimodal learning for clinical decision support, pp. 3–11. Springer, 2018.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.