Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

scRDiT: Generating single-cell RNA-seq data by diffusion transformers and accelerating sampling (2404.06153v1)

Published 9 Apr 2024 in cs.LG and q-bio.GN

Abstract: Motivation: Single-cell RNA sequencing (scRNA-seq) is a groundbreaking technology extensively utilized in biological research, facilitating the examination of gene expression at the individual cell level within a given tissue sample. While numerous tools have been developed for scRNA-seq data analysis, the challenge persists in capturing the distinct features of such data and replicating virtual datasets that share analogous statistical properties. Results: Our study introduces a generative approach termed scRNA-seq Diffusion Transformer (scRDiT). This method generates virtual scRNA-seq data by leveraging a real dataset. The method is a neural network constructed based on Denoising Diffusion Probabilistic Models (DDPMs) and Diffusion Transformers (DiTs). This involves subjecting Gaussian noises to the real dataset through iterative noise-adding steps and ultimately restoring the noises to form scRNA-seq samples. This scheme allows us to learn data features from actual scRNA-seq samples during model training. Our experiments, conducted on two distinct scRNA-seq datasets, demonstrate superior performance. Additionally, the model sampling process is expedited by incorporating Denoising Diffusion Implicit Models (DDIM). scRDiT presents a unified methodology empowering users to train neural network models with their unique scRNA-seq datasets, enabling the generation of numerous high-quality scRNA-seq samples. Availability and implementation: https://github.com/DongShengze/scRDiT

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet., 10(1):57–63, 2009.
  2. RNA-seq: from technology to biology. Cell. Mol. Life Sci., 67:569–579, 2010.
  3. rMATS: robust and flexible detection of differential alternative splicing from replicate RNA-Seq data. Proc. Natl. Acad. Sci. USA, 111(51):E5593–E5601, 2014.
  4. Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Res., 40(10):4288–4297, 2012.
  5. Comprehensive analysis of RNA-Seq data reveals extensive RNA editing in a human transcriptome. Nat. Biotechnol., 30(3):253–260, 2012.
  6. mRNA-seq whole-transcriptome analysis of a single cell. Nat. Methods, 6(5):377–382, 2009.
  7. SC3: consensus clustering of single-cell RNA-seq data. Nat. Methods, 14(5):483–486, May 2017.
  8. Comprehensive integration of single-cell data. Cell, 177(7):1888–1902, Jun 2019.
  9. SingleCellNet: A computational tool to classify single cell RNA-seq data across platforms and across species. Cell Syst., 9(2):207–213, Aug 2019.
  10. Supervised classification enables rapid annotation of cell atlases. Nat. Methods, 16(10):983–986, Oct 2019.
  11. scAlign: a tool for alignment, integration, and rare cell identification from scRNA-seq data. Genome Biol., 20(1):166, Aug 2019.
  12. GiniClust2: a cluster-aware, weighted ensemble clustering method for cell-type detection. Genome Biol., 19(1):58, May 2018.
  13. Discovery of rare cells from voluminous single cell expression data. Nat. Commun., 9(1):4719, Nov 2018.
  14. PseudotimeDE: inference of differential gene expression along cell pseudotime with well-calibrated p-values from single-cell RNA sequencing data. Genome Biol., 22(1):124, Apr 2021.
  15. Bayesian approach to single-cell differential expression analysis. Nat. Methods, 11(7):740–742, Jul 2014.
  16. Trajectory-based differential expression analysis for single-cell sequencing data. Nat. Commun., 11(1):1201, Mar 2020.
  17. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol., 32(4):381–386, Apr 2014.
  18. Reversed graph embedding resolves complex single-cell trajectories. Nat. Methods, 14(10):979–982, Oct 2017.
  19. Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genomics, 19(1):477, Jun 2018.
  20. The single-cell transcriptional landscape of mammalian organogenesis. Nature, 566(7745):496–502, Feb 2019.
  21. Power failure: why small sample size undermines the reliability of neuroscience. Nat. Rev. Neurosci., 14(5):365–376, 2013.
  22. A statistical simulator scDesign for rational scRNA-seq experimental design. Bioinformatics, 35(15):i41–i50, 2019.
  23. scDesign2: a transparent simulator that generates high-fidelity single-cell gene expression count data with gene correlations captured. Genome Biol., 22:163, 2022.
  24. A general and flexible method for signal extraction from single-cell RNA-seq data. Nat. Commun., 9:284, 2018.
  25. Splatter: simulation of single-cell RNA sequencing data. Genome Biol., 18:174, 2017.
  26. SPARSim single cell: a count data simulator for scRNA-seq data. Bioinformatics, 36(5):1468–1475, 2020.
  27. Realistic in silico generation and augmentation of single-cell RNA-seq data using generative adversarial networks. Nat. Commun., 11(1):166, 2020.
  28. Deep generative modeling for single-cell transcriptomics. Nat. Methods, 15:1053–1058, 2018.
  29. Simulating multiple faceted variability in single cell RNA sequencing. Nat. Commun., 19:2611, 2019.
  30. A statistical approach for identifying differential distributions in single-cell RNA-seq experiments. Genome Biol., 17:222, 2016.
  31. Sarah Webb et al. Deep learning for biology. Nature, 554(7693):555–557, 2018.
  32. Application of deep learning methods in biological networks. Brief. Bioinformatics, 22(2):1902–1917, 2021.
  33. Opportunities and obstacles for deep learning in biology and medicine. J. Roy. Soc. Interface, 15(141):20170387, 2018.
  34. Generative adversarial networks. Commun. ACM, 63(11):139–144, 2020.
  35. Generating bulk RNA-Seq gene expression data based on generative deep learning models and utilizing it for data augmentation. Comput. Biol. Med., 169:107828, 2024.
  36. Interpretable factor models of single-cell RNA-seq via variational autoencoders. Bioinformatics, 36(11):3418–3421, 2020.
  37. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst., 33:6840–6851, 2020.
  38. Diffusion models beat gans on image synthesis. Adv. Neural Inf. Process. Syst., 34:8780–8794, 2021.
  39. Diffusion models in bioinformatics: A new wave of deep learning revolution in action. arXiv, page 2302.10907, 2023.
  40. DiffRNAFold: Generating RNA tertiary structures with latent space diffusion. In Deep Generative Models for Health Workshop NeurIPS 2023, 2023.
  41. Benjamin L. Kidder. Advanced image generation for cancer using diffusion models. bioRxiv, page 2023.08.18.553859, 2023.
  42. Dirichlet diffusion score model for biological sequence generation. arXiv, page 2305.10699, 2023.
  43. U-Net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241. Springer, 2015.
  44. Single-cell RNA-seq synthesis with latent diffusion model. arXiv, page 2312.14220, 2023.
  45. Scalable diffusion models with transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4195–4205, 2023.
  46. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  47. Denoising diffusion implicit models. arXiv, page 2010.02502, 2020.
  48. Use of coefficient of variation in assessing variability of quantitative assays. Clin. Vaccine Immunol., 9(6):1235–1239, 2002.
  49. Single-cell transcriptomic analysis of primary and metastatic tumor ecosystems in head and neck cancer. Cell, 171(7):1611–1624, 2017.
  50. Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-SNE. J. Mach. Learn. Res., 9(11), 2008.
  51. Integrated analysis of single-cell and spatial transcriptomics in keloids: highlights on fibrovascular interactions in keloid pathogenesis. J. Invest. Dermatol., 142(8):2128–2139, 2022.
Citations (1)

Summary

We haven't generated a summary for this paper yet.