Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Kernel-Based Testing for Single-Cell Differential Analysis (2307.08509v3)

Published 17 Jul 2023 in stat.ML and cs.LG

Abstract: Single-cell technologies offer insights into molecular feature distributions, but comparing them poses challenges. We propose a kernel-testing framework for non-linear cell-wise distribution comparison, analyzing gene expression and epigenomic modifications. Our method allows feature-wise and global transcriptome/epigenome comparisons, revealing cell population heterogeneities. Using a classifier based on embedding variability, we identify transitions in cell states, overcoming limitations of traditional single-cell analysis. Applied to single-cell ChIP-Seq data, our approach identifies untreated breast cancer cells with an epigenomic profile resembling persister cells. This demonstrates the effectiveness of kernel testing in uncovering subtle population variations that might be missed by other methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (65)
  1. An atlas of the aging lung mapped by single cell transcriptomics and deep tissue proteomics. Nature Communications, 10(1):963, Feb. 2019. Number: 1 Publisher: Nature Publishing Group.
  2. Multiple kernel learning, conic duality, and the SMO algorithm. In Proceedings of the twenty-first international conference on Machine learning, ICML ’04, page 6, New York, NY, USA, July 2004. Association for Computing Machinery.
  3. A Nearest-Neighbor Based Nonparametric Test for Viral Remodeling in Heterogeneous Single-Cell Proteomic Data. arXiv, June 2020. arXiv: 2003.02937.
  4. Single-cell CUT&Tag profiles histone modifications and transcription factors in complex tissues. Nature Biotechnology, 39(7):825–835, July 2021.
  5. Benjamini et Hochberg. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing on JSTOR, 1995.
  6. Single-cell chromatin accessibility reveals principles of regulatory variation. Nature, 523(7561):486–490, July 2015.
  7. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nature biotechnology, 36(5):411–420, 2018. Publisher: Nature Publishing Group US New York.
  8. scCODA is a Bayesian model for compositional single-cell data analysis. Nature Communications, 12(1):6876, Nov. 2021. Number: 1 Publisher: Nature Publishing Group.
  9. Single-cell transcriptomics identifies an effectorness gradient shaping the response of CD4+ T cells to cytokines. Nature Communications, 11(1):1801, Apr. 2020.
  10. scDC: single cell differential composition analysis. BMC Bioinformatics, 20(19):721, Dec. 2019.
  11. Differential abundance testing on single-cell data using k-nearest neighbor graphs. Nature Biotechnology, 40(2):245–253, Feb. 2022.
  12. Differential Expression Analysis of Single-Cell RNA-Seq Data: Current Statistical Approaches and Outstanding Challenges. Entropy (Basel, Switzerland), 24(7):995, July 2022.
  13. MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biology, 16(1):278, Dec. 2015.
  14. Large sample analysis of the median heuristic, Oct. 2018. arXiv:1707.07269 [math, stat].
  15. Distribution-free complex hypothesis testing for single-cell RNA-seq differential expression analysis, Nov. 2021. bioRxiv.
  16. Single-cell genome sequencing: current state of the science. Nature Reviews Genetics, 17(3):175–188, Mar. 2016.
  17. A Kernel Method for the Two-Sample-Problem. In Advances in Neural Information Processing Systems, volume 19. MIT Press, 2006.
  18. A Kernel Two-Sample Test. Journal of Machine Learning Research, 13(25):723–773, 2012.
  19. Optimal kernel choice for large-scale two-sample tests. In Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc., 2012.
  20. High-throughput single-cell ChIP-seq identifies heterogeneity of chromatin states in breast cancer. Nature genetics, 51(6):1060–1066, June 2019.
  21. C. Hafemeister and R. Satija. Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. Genome Biology, 20(1):296, Dec. 2019.
  22. Gene expression variability across cells and species shapes innate immunity. Nature, 563(7730):197–202, Nov. 2018.
  23. Spectral Regularized Kernel Two-Sample Tests, Dec. 2022. arXiv:2212.09201 [cs, math, stat].
  24. Kernel-Based Methods for Hypothesis Testing: A Unified View. IEEE Signal Processing Magazine, 30(4):87–97, July 2013.
  25. Testing for Homogeneity with Kernel Fisher Discriminant Analysis. stat, 1050:7, 2008.
  26. A regularized kernel-based approach to unsupervised audio segmentation. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 1665–1668, Taipei, Taiwan, Apr. 2009. IEEE.
  27. Massively parallel single cell RNA-Seq for marker-free decomposition of tissues into cell types. Science (New York, N.Y.), 343(6172):776–779, Feb. 2014.
  28. Probability Product Kernels. Journal of Machine Learning Research, 5(Jul):819–844, 2004.
  29. Bayesian approach to single-cell differential expression analysis. Nature Methods, 11(7):740–742, July 2014.
  30. Classification accuracy as a proxy for two-sample testing. The Annals of Statistics, 49(1):411–434, 2021.
  31. A statistical approach for identifying differential distributions in single-cell RNA-seq experiments. Genome Biology, 17(1):222, Dec. 2016.
  32. A Witness Two-Sample Test. In Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, pages 1403–1419. PMLR, May 2022. ISSN: 2640-3498.
  33. D. Lopez-Paz and M. Oquab. Revisiting Classifier Two-Sample Tests, Mar. 2018. arXiv:1610.06545 [stat].
  34. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 15(12):550, Dec. 2014.
  35. L. v. d. Maaten and G. Hinton. Visualizing Data using t-SNE. Journal of Machine Learning Research, 9(86):2579–2605, 2008.
  36. Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets. Cell, 161(5):1202–1214, May 2015.
  37. Role of the polycomb protein Eed in the propagation of repressive histone marks. Nature, 461(7265):762–767, Oct. 2009.
  38. H3K27me3 conditions chemotolerance in triple-negative breast cancer. Nature Genetics, 54(4):459–468, Apr. 2022.
  39. Data exploration, quality control and testing in single-cell qPCR-based gene expression experiments. Bioinformatics, 29(4):461–467, Feb 2013.
  40. UMAP: Uniform Manifold Approximation and Projection. Journal of Open Source Software, 3(29):861, Sept. 2018.
  41. DEsingle for detecting three types of differential expression in single-cell RNA-seq data. Bioinformatics, 34(18):3223–3224, Sept. 2018.
  42. Fisher discriminant analysis with kernels. In Neural networks for signal processing IX: Proceedings of the 1999 IEEE signal processing society workshop (cat. no. 98th8468), pages 41–48. Ieee, 1999.
  43. Kernel Mean Embedding of Distributions: A Review and Beyond. Foundations and Trends® in Machine Learning, 10(1-2):1–141, 2017. arXiv: 1605.09522.
  44. Distribution-Free Multisample Tests Based on Optimal Matchings With Applications to Single Cell Genomics. Journal of the American Statistical Association, 117(538):627–638, Apr. 2022.
  45. A discriminative learning approach to differential expression analysis for single-cell RNA-seq. Nat Methods, 16(2):163–166, Feb 2019.
  46. A. Ozier-Lafontaine. Kernel-based Testing and their applications to single-cell Data. PhD thesis, Nantes University, 2023.
  47. S. Pott and J. D. Lieb. Single-cell ATAC-seq: strength in numbers. Genome Biology, 16(1):172, Aug. 2015.
  48. Single-Cell Transcriptomic Analysis of Human Lung Provides Insights into the Pathobiology of Pulmonary Fibrosis. American Journal of Respiratory and Critical Care Medicine, 199(12):1517–1536, June 2019.
  49. Single-Cell-Based Analysis Highlights a Surge in Cell-to-Cell Molecular Variability Preceding Irreversible Commitment in a Differentiation Process. PLOS Biology, 14(12):e1002585, Dec. 2016.
  50. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research, 43(7):e47, Apr. 2015.
  51. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics, 26(1):139–140, Jan. 2010.
  52. Single-cell ChIP-seq reveals cell subpopulations defined by chromatin state. Nature Biotechnology, 33(11):1165–1172, Nov. 2015.
  53. Fast identification of differential distributions in single-cell RNA-sequencing data with waddR. Bioinformatics, 37(19):3204–3211, Oct. 2021.
  54. MMD Aggregated Two-Sample Test, June 2022. arXiv:2110.15073 [cs, math, stat].
  55. J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis, June 2004. Cambridge University Press, New York, NY, USA.
  56. Single-cell and single-molecule epigenomics to uncover genome regulation at unprecedented resolution. Nature Genetics, 51(1):19–25, Jan. 2019.
  57. C.-J. Simon-Gabriel and B. Schölkopf. Kernel Distribution Embeddings: Universal Kernels, Characteristic Kernels and Kernel Metrics on Distributions. Journal of Machine Learning Research, 19(44):1–29, 2018.
  58. Confronting false discoveries in single-cell differential expression. Nature Communications, 12(1):5692, Sept. 2021.
  59. V. Svensson. Droplet scRNA-seq is not zero-inflated. Nature Biotechnology, 38(2):147–150, Feb. 2020. Number: 2 Publisher: Nature Publishing Group.
  60. distinct: a novel approach to differential distribution analyses, Apr. 2022. bioRxiv.
  61. A Probabilistic Graph Coupling View of Dimension Reduction. Advances in Neural Information Processing Systems, 35:10696–10708, Dec. 2022.
  62. T. Wang and S. Nabavi. SigEMD: A powerful method for differential gene expression analysis in single-cell RNA sequencing data. Methods, 145:25–32, Aug. 2018.
  63. C. K. I. Williams and M. Seeger. Using the Nystrom Method to Speed Up Kernel Machines. In T. K. Leen, T. G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems 13, pages 682–688. MIT Press, 2001.
  64. Massively parallel digital transcriptional profiling of single cells. Nature Communications, 8:14049, Jan. 2017.
  65. Evidence for close molecular proximity between reverting and undifferentiated cells. BMC Biology, 20(1):155, July 2022.
Citations (4)

Summary

We haven't generated a summary for this paper yet.