Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The Conditional Prediction Function: A Novel Technique to Control False Discovery Rate for Complex Models (2310.04919v1)

Published 7 Oct 2023 in stat.ME, cs.LG, and stat.ML

Abstract: In modern scientific research, the objective is often to identify which variables are associated with an outcome among a large class of potential predictors. This goal can be achieved by selecting variables in a manner that controls the the false discovery rate (FDR), the proportion of irrelevant predictors among the selections. Knockoff filtering is a cutting-edge approach to variable selection that provides FDR control. Existing knockoff statistics frequently employ linear models to assess relationships between features and the response, but the linearity assumption is often violated in real world applications. This may result in poor power to detect truly prognostic variables. We introduce a knockoff statistic based on the conditional prediction function (CPF), which can pair with state-of-art machine learning predictive models, such as deep neural networks. The CPF statistics can capture the nonlinear relationships between predictors and outcomes while also accounting for correlation between features. We illustrate the capability of the CPF statistics to provide superior power over common knockoff statistics with continuous, categorical, and survival outcomes using repeated simulations. Knockoff filtering with the CPF statistics is demonstrated using (1) a residential building dataset to select predictors for the actual sales prices and (2) the TCGA dataset to select genes that are correlated with disease staging in lung cancer patients.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. Controlling the false discovery rate via knockoffs. The Annals of Statistics, 43(5):2055–2085.
  2. Metropolized knockoff sampling. Journal of the American Statistical Association, 116(535):1413–1427.
  3. Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B, 57(1):289–300.
  4. Panning for gold: ‘Model-X’ knockoffs for high dimensional controlled variable selection. Journal of the Royal Statistical Society: Series B, 80(3):551–577.
  5. Knockoff-inspired feature selection via generative models. preprint.
  6. A novel human monoclonal antibody that binds with high affinity to mesothelin-expressing cells and kills them by antibody-dependent cell-mediated cytotoxicity. Molecular Cancer Therapeutics, 8(5):1113–1118.
  7. A proportional hazards model for the subdistribution of a competing risk. Journal of the American Statistical Association, 94(446):496–509.
  8. Intratumoral immunoglobulin isotypes predict survival in lung adenocarcinoma subtypes. Journal for ImmunoTherapy of Cancer, 7(1).
  9. B cell receptor (BCR) diversity and differential cdr3s usage as potential immune indictors of diffuse large B cell lymphoma (DLBCL). preprint.
  10. KnockoffGAN: Generating knockoffs for feature selection using generative adversarial networks. In International Conference on Learning Representations.
  11. DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Medical Research Methodology, 18(1):24.
  12. DeepHit: A deep learning approach to survival analysis with competing risks. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1).
  13. MiR-629-3p-induced downregulation of SFTPC promotes cell proliferation and predicts poor survival in lung adenocarcinoma. Artificial Cells, Nanomedicine, and Biotechnology, 47(1):3286–3296. PMID: 31379200.
  14. Genetic deletions in sputum as diagnostic markers for early detection of stage I non–small cell lung cancer. Clinical Cancer Research, 13(2):482–487.
  15. Extracellular rnas from lung cancer cells activate epithelial cells and induce neutrophil extracellular traps. International Journal of Oncology, 55(1):69–80.
  16. Glyceraldehyde-3-phosphate dehydrogenase promotes cancer growth and metastasis through upregulation of snail expression. International Journal of Oncology, 50(1):252–262.
  17. Deep latent variable models for generating knockoffs. Stat, 8(1):e260. e260 sta4.260.
  18. Surveillance of tumour development: The relationship between tumour-associated RNAs and ribonucleases. Frontiers in Pharmacology, 10.
  19. knockoff: The Knockoff Filter for Controlled Variable Selection. R package version 0.3.3.
  20. Circulating microvesicles and exosomes in small cell lung cancer by quantitative proteomics. Clinical Proteomics, 19(1):2.
  21. Glyceraldehyde-3-phosphate dehydrogenase gene over expression correlates with poor prognosis in non small cell lung cancer patients. Molecular Cancer, 12(1):97.
  22. Novel immune subtypes of lung adenocarcinoma identified through bioinformatic analysis. FEBS Open Bio, 10(9):1921–1933.
  23. Novel machine-learning model for estimating construction costs considering economic variables and indexes. Journal of Construction Engineering and Management, 144(12):04018106.
  24. Deep knockoffs. Journal of the American Statistical Association, pages 1–12.
  25. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B, 58(1):267–288.
  26. A molecular cell atlas of the human lung from single-cell RNA sequencing. Nature, 587(7835):619–625.
  27. Prognostic value of immune related genes in lung adenocarcinoma. Oncology Letters, 20(5):259.
  28. Genetic defects in surfactant protein a2 are associated with pulmonary fibrosis and lung cancer. The American Journal of Human Genetics, 84(1):52–59.
  29. A graphical model of smoking-induced global instability in lung cancer. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 15(1):1–14.
  30. A power and prediction analysis for knockoffs with lasso statistics. preprint arXiv:1712.06465.
  31. Next-generation sequencing revealed a distinct immunoglobulin repertoire with specific mutation hotspots in acute myeloid leukemia. Biology, 11(2).
  32. lncrna trhde-as1 correlated with genomic landscape and clinical outcome in glioma. Genes, 14(5).

Summary

We haven't generated a summary for this paper yet.