Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Single-Cell RNA-seq Synthesis with Latent Diffusion Model (2312.14220v1)

Published 21 Dec 2023 in q-bio.GN, cs.AI, and cs.LG

Abstract: The single-cell RNA sequencing (scRNA-seq) technology enables researchers to study complex biological systems and diseases with high resolution. The central challenge is synthesizing enough scRNA-seq samples; insufficient samples can impede downstream analysis and reproducibility. While various methods have been attempted in past research, the resulting scRNA-seq samples were often of poor quality or limited in terms of useful specific cell subpopulations. To address these issues, we propose a novel method called Single-Cell Latent Diffusion (SCLD) based on the Diffusion Model. This method is capable of synthesizing large-scale, high-quality scRNA-seq samples, including both 'holistic' or targeted specific cellular subpopulations within a unified framework. A pre-guidance mechanism is designed for synthesizing specific cellular subpopulations, while a post-guidance mechanism aims to enhance the quality of scRNA-seq samples. The SCLD can synthesize large-scale and high-quality scRNA-seq samples for various downstream tasks. Our experimental results demonstrate state-of-the-art performance in cell classification and data distribution distances when evaluated on two scRNA-seq benchmarks. Additionally, visualization experiments show the SCLD's capability in synthesizing specific cellular subpopulations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. The human cell atlas. Elife, 6, 2017.
  2. Dendritic cells and cytokines in human inflammatory and autoimmune diseases. Cytokine & growth factor reviews, 19(1):41–52, 2008.
  3. Breiman, L. Random forests. Machine learning, 45(1):5–32, 2001.
  4. Power failure: why small sample size undermines the reliability of neuroscience. Nature reviews neuroscience, 14(5):365–376, 2013.
  5. Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems, 34:8780–8794, 2021.
  6. Fawcett, T. An introduction to roc analysis. Pattern recognition letters, 27(8):861–874, 2006.
  7. Single-cell analysis in biotechnology, systems biology, and biocatalysis. Annual review of chemical and biomolecular engineering, 3:129–155, 2012.
  8. Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences. Atmospheric environment, 32(14-15):2627–2636, 1998.
  9. Generative adversarial networks. Communications of the ACM, 63(11):139–144, 2020.
  10. Initiation of tumor necrosis factor-α𝛼\alphaitalic_α antagonists and the risk of hospitalization for infection in patients with autoimmune diseases. Jama, 306(21):2331–2339, 2011.
  11. Mapping the mouse cell atlas by microwell-seq. Cell, 172(5):1091–1107, 2018.
  12. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  770–778, 2016.
  13. ACTIVA: realistic single-cell RNA-seq generation with automatic cell-type identification using introspective variational autoencoders. Bioinformatics, 38(8):2194–2201, 02 2022. ISSN 1367-4803.
  14. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
  15. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
  16. Using auc and accuracy in evaluating learning algorithms. IEEE Transactions on knowledge and Data Engineering, 17(3):299–310, 2005.
  17. Ibragimov, I. A. On the composition of unimodal distributions. Theory of Probability & Its Applications, 1(2):255–260, 1956.
  18. Geometry based data generation. Advances in Neural Information Processing Systems, 31, 2018.
  19. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  20. Are gans created equal? a large-scale study. Advances in neural information processing systems, 31, 2018.
  21. Realistic in silico generation and augmentation of single-cell rna-seq data using generative adversarial networks. Nature communications, 11(1):1–12, 2020.
  22. A manifesto for reproducible science. Nature human behaviour, 1(1):1–9, 2017.
  23. Narkhede, S. Understanding auc-roc curve. Towards Data Science, 26(1):220–227, 2018.
  24. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pp.  8162–8171. PMLR, 2021.
  25. Single-cell rna sequencing to explore immune cell heterogeneity. Nature Reviews Immunology, 18(1):35–45, 2018.
  26. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
  27. Photorealistic text-to-image diffusion models with deep language understanding. arXiv preprint arXiv:2205.11487, 2022.
  28. Removal of batch effects using distribution-matching residual networks. Bioinformatics, 33(16):2539–2546, 2017.
  29. A survey on image data augmentation for deep learning. Journal of big data, 6(1):1–48, 2019.
  30. Inflammation and cancer. Annals of African medicine, 18(3):121, 2019.
  31. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pp.  2256–2265. PMLR, 2015.
  32. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020.
  33. Veegan: Reducing mode collapse in gans using implicit variational learning. Advances in neural information processing systems, 30, 2017.
  34. A note on the evaluation of generative models. arXiv preprint arXiv:1511.01844, 2015.
  35. Traag, V. A. Faster unfolding of communities: Speeding up the louvain algorithm. Physical Review E, 92(3):032801, 2015.
  36. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
  37. scIGANs: single-cell RNA-seq imputation using generative adversarial networks. Nucleic Acids Research, 48(15):e85–e85, 06 2020.
  38. Splatter: simulation of single-cell rna sequencing data. Genome biology, 18(1):1–15, 2017.
  39. Understanding bag-of-words model: a statistical framework. International journal of machine learning and cybernetics, 1:43–52, 2010.
  40. Massively parallel digital transcriptional profiling of single cells. Nature communications, 8(1):1–12, 2017.
  41. Unet++: A nested u-net architecture for medical image segmentation. In Deep learning in medical image analysis and multimodal learning for clinical decision support, pp.  3–11. Springer, 2018.
Citations (1)

Summary

We haven't generated a summary for this paper yet.