Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Causal Inference for Genomic Data with Multiple Heterogeneous Outcomes (2404.09119v4)

Published 14 Apr 2024 in stat.ME, stat.AP, and stat.ML

Abstract: With the evolution of single-cell RNA sequencing techniques into a standard approach in genomics, it has become possible to conduct cohort-level causal inferences based on single-cell-level measurements. However, the individual gene expression levels of interest are not directly observable; instead, only repeated proxy measurements from each individual's cells are available, providing a derived outcome to estimate the underlying outcome for each of many genes. In this paper, we propose a generic semiparametric inference framework for doubly robust estimation with multiple derived outcomes, which also encompasses the usual setting of multiple outcomes when the response of each unit is available. To reliably quantify the causal effects of heterogeneous outcomes, we specialize the analysis to standardized average treatment effects and quantile treatment effects. Through this, we demonstrate the use of the semiparametric inferential results for doubly robust estimators derived from both Von Mises expansions and estimating equations. A multiple testing procedure based on Gaussian multiplier bootstrap is tailored for doubly robust estimators to control the false discovery exceedance rate. Applications in single-cell CRISPR perturbation analysis and individual-level differential expression analysis demonstrate the utility of the proposed methods and offer insights into the usage of different estimands for causal inference in genomics.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Semiparametric estimation of treatment effects in randomized experiments. Technical report, National Bureau of Economic Research.
  2. High-dimensional econometrics and regularized gmm. arXiv preprint arXiv:1806.01888.
  3. Program evaluation and causal inference with high-dimensional data. Econometrica, 85(1):233–298.
  4. A general framework for treatment effect estimation in semi-supervised and high dimensional settings. arXiv preprint arXiv:2201.00468.
  5. Double/debiased machine learning for treatment and structural parameters: Double/debiased machine learning. The Econometrics Journal, 21(1).
  6. Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. The Annals of Statistics, 41(6):2786–2819.
  7. Díaz, I. (2017). Efficient estimation of quantiles in missing data models. Journal of Statistical Planning and Inference, 190:39–51.
  8. Perturb-seq: dissecting molecular circuits with scalable single-cell rna profiling of pooled genetic screens. cell, 167(7):1853–1866.
  9. Simultaneous inference for generalized linear models with unmeasured confounders. arXiv preprint arXiv:2309.07261.
  10. Editorial (2023). A focus on single-cell omics. Nat Rev Genet, 24(8):485.
  11. A general, multivariate definition of causal effects in epidemiology. Epidemiology, 26(4):481–489.
  12. Exceedance control of the false discovery proportion. Journal of the American Statistical Association, 101(476):1408–1417.
  13. Robust statistics: the approach based on influence functions. John Wiley & Sons.
  14. Imbens, G. W. (2004). Nonparametric estimation of average treatment effects under exogeneity: A review. Review of Economics and statistics, 86(1):4–29.
  15. Causal inference in statistics, social, and biomedical sciences. Cambridge University Press.
  16. Statistics or biology: the zero-inflation controversy about scrna-seq data. Genome biology, 23(1):1–24.
  17. Localized debiased machine learning: Efficient inference on quantile treatment effects and beyond. Journal of Machine Learning Research, 25(16):1–59.
  18. Kennedy, E. H. (2022). Semiparametric doubly robust targeted double machine learning: a review. arXiv preprint arXiv:2203.06469.
  19. Sharp instruments for classifying compliers and generalizing causal effects. The Annals of Statistics, 48(4):2008–2030.
  20. Semiparametric counterfactual density estimation. Biometrika, 110(4):875–896.
  21. Estimating scaled treatment effects with multiple outcomes. Statistical methods in medical research, 28(4):1094–1104.
  22. On least squares estimation under heteroscedastic and heavy-tailed errors. The Annals of Statistics, 50(1):277–302.
  23. High-throughput single-cell functional elucidation of neurodevelopmental disease–associated genes reveals convergent mechanisms altering neuronal differentiation. Genome research, 30(9):1317–1331.
  24. General forms of finite population central limit theorems with applications to causal inference. Journal of the American Statistical Association, 112(520):1759–1769.
  25. A scaled linear mixed model for multiple outcomes. Biometrics, 56(2):593–601.
  26. Joint and marginal causal effects for binary non-independent outcomes. Journal of Multivariate Analysis, 178:104609.
  27. Exploiting multiple outcomes in bayesian principal stratification analysis with application to the evaluation of a job training program. The Annals of Applied Statistics, pages 2336–2360.
  28. Using secondary outcomes to sharpen inference in randomized experiments with noncompliance. Journal of the American Statistical Association, 108(503):1120–1131.
  29. Identification of principal causal effects using additional outcomes in concentration graphs. Journal of Educational and Behavioral Statistics, 41(5):463–480.
  30. Improving inference of gaussian mixtures using auxiliary variables. Statistical Analysis and Data Mining: The ASA Data Science Journal, 8(1):34–48.
  31. Single-cell RNA-seq reveals cell type-specific molecular and genetic associations to lupus. Science, 376(6589):eabf1970.
  32. The analysis of multiple endpoints in clinical trials. Biometrics, pages 487–498.
  33. Unveiling the unobservable: Causal inference on multiple derived outcomes. Journal of the American Statistical Association, pages 1–12.
  34. Scaled marginal models for multiple continuous outcomes. Biostatistics, 4(3):371–383.
  35. Multivariate linear mixed models for multiple outcomes. Statistics in medicine, 18(17-18):2479–2492.
  36. Separating measurement and expression models clarifies confusion in single-cell rna sequencing analysis. Nature genetics, 53(6):770–777.
  37. Statistical analysis of noncommensurate multiple outcomes. Circulation: Cardiovascular Quality and Outcomes, 4(6):650–656.
  38. Bayesian models for multiple outcomes nested in domains. Biometrics, 65(4):1078–1086.
  39. Tsiatis, A. (2007). Semiparametric Theory and Missing Data. Springer Science & Business Media.
  40. Van der Vaart, A. W. (2000). Asymptotic statistics, volume 3. Cambridge university press.
  41. Alternative methods for testing treatment effects on the basis of multiple outcomes: simulation and case study. Statistics in medicine, 30(16):1917–1932.
  42. Ideas: individual level differential expression analysis for single-cell rna-seq data. Genome biology, 23(1):1–17.
  43. Confidence regions for entries of a large precision matrix. Journal of Econometrics, 206(1):57–82.
  44. Causal effects based on distributional distances. arXiv preprint arXiv:1806.02935.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com