Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Integrated path stability selection (2403.15877v2)

Published 23 Mar 2024 in stat.ME and stat.ML

Abstract: Stability selection is a popular method for improving feature selection algorithms. One of its key attributes is that it provides theoretical upper bounds on the expected number of false positives, E(FP), enabling control of false positives in practice. However, stability selection often selects very few features, resulting in low sensitivity. This is because existing bounds on E(FP) are relatively loose, causing stability selection to overestimate the number of false positives. In this paper, we introduce a novel approach to stability selection based on integrating stability paths rather than maximizing over them. This yields upper bounds on E(FP) that are orders of magnitude stronger than previous bounds, leading to significantly more true positives in practice for the same target E(FP). Furthermore, our method takes the same amount of computation as the original stability selection algorithm, and only requires one user-specified parameter, which can be either the target E(FP) or target false discovery rate. We demonstrate the method on simulations and real data from prostate and colon cancer studies.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. False discovery rate estimation for stability selection: application to genome-wide association studies. Statistical Applications in Genetics and Molecular Biology, 10(1), 2011.
  2. D. H. Alexander and K. Lange. Stability selection for genome-wide association. Genetic Epidemiology, 35(7):722–728, 2011.
  3. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences, 96(12):6745–6750, 1999.
  4. Comprehensive analysis of prognostic and genetic signatures for general transcription factor III (GTF3) in clinical colorectal cancer patients using bioinformatics approaches. Current Issues in Molecular Biology, 43(1):2–20, 2021.
  5. Desmin expression in colorectal cancer stroma correlates with advanced stage disease and marks angiogenic microvessels. Clinical Proteomics, 8(1):1–13, 2011.
  6. Gene selection for tumor classification using neighborhood rough sets and entropy measures. Journal of Biomedical Informatics, 67:59–68, 2017.
  7. eIF4E phosphorylation in prostate cancer. Neoplasia, 20(6):563–573, 2018.
  8. Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3):432–441, 2008.
  9. Loss of STK11 expression is an early event in prostate carcinogenesis and predicts therapeutic response to targeted therapy against MAPK/p38. Autophagy, 11(11):2102–2113, 2015.
  10. TIGRESS: trustful inference of gene regulation using stability selection. BMC Systems Biology, 6(1):1–17, 2012.
  11. Controlling false discoveries in high-dimensional situations: boosting with stability selection. BMC Bioinformatics, 16:1–17, 2015.
  12. Adipocyte p62/SQSTM1 suppresses tumorigenesis through opposite regulations of metabolism in adipose tissue and tumor. Cancer Cell, 33(4):770–784, 2018.
  13. Clinical implications of PTEN loss in prostate cancer. Nature Reviews Urology, 15(4):222–234, 2018.
  14. Efficient l1subscript𝑙1l_{1}italic_l start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-regularized logistic regression. In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI), volume 6, pages 401–408, 2006.
  15. A note on the lasso and related procedures in model selection. Statistica Sinica, pages 1273–1284, 2006.
  16. In vitro and in vivo treatment of colon cancer by VIP antagonists. Regulatory Peptides, 109(1-3):127–133, 2002.
  17. Ensembling variable selectors by stability selection for the Cox model. In 2017 International Conference on Machine Learning and Cybernetics (ICMLC), volume 1, pages 35–41. IEEE, 2017.
  18. Bootstrap inference for network construction with an application to a breast cancer microarray study. The Annals of Applied Statistics, 7(1):391, 2013.
  19. Stability selection enables robust learning of differential equations from limited noisy data. Proceedings of the Royal Society A, 478(2262):20210916, 2022.
  20. J. Mairal and B. Yu. Complexity analysis of the lasso regularization path. arXiv preprint arXiv:1205.0079, 2012.
  21. N. Meinshausen and P. Bühlmann. Stability selection. Journal of the Royal Statistical Society Series B: Statistical Methodology, 72(4):417–473, 2010.
  22. Prediction and validation of GUCA2B as the hub-gene in colorectal cancer based on co-expression network analysis: In-silico and in-vivo study. Biomedicine & Pharmacotherapy, 147:112691, 2022.
  23. Control of prostate cell growth: BMP antagonizes androgen mitogenic activity with incorporation of MAPK signals in Smad1. The EMBO Journal, 26(2):346–357, 2007.
  24. Loss of Notch1 activity inhibits prostate cancer growth and metastasis and sensitizes prostate cancer cells to antiandrogen therapies. Molecular Cancer Therapeutics, 18(7):1230–1242, 2019.
  25. MAP kinases and prostate cancer. Journal of Signal Transduction, 2012, 2012.
  26. Variable selection with error control: another look at stability selection. Journal of the Royal Statistical Society Series B: Statistical Methodology, 75(1):55–80, 2013.
  27. An androgen-regulated miRNA suppresses Bak1 expression and induces androgen-independent growth of prostate cancer cells. Proceedings of the National Academy of Sciences, 104(50):19983–19988, 2007.
  28. DARS-AS1: A Vital Oncogenic LncRNA Regulator with Potential for Cancer Prognosis and Therapy. International Journal of Medical Sciences, 21(3):571, 2024.
  29. P-cadherin promotes liver metastasis and is associated with poor prognosis in colon cancer. The American Journal of Pathology, 179(1):380–390, 2011.
  30. DIRAS3 (ARHI) blocks RAS/MAPK signaling by binding directly to RAS and disrupting RAS clusters. Cell Reports, 29(11):3448–3459, 2019.
  31. Protein kinase C promotes apoptosis in LNCaP prostate cancer cells through activation of p38 MAPK and inhibition of the Akt survival pathway. Journal of Biological Chemistry, 278(36):33753–33762, 2003.
  32. R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 58(1):267–288, 1996.
  33. Up-regulation of hnRNP A1 gene in sporadic human colorectal cancers. International Journal of Oncology, 26(3):635–640, 2005.
  34. LinkedOmics: analyzing multi-omics data within and across 32 cancer types. Nucleic Acids Research, 46(D1):D956–D963, 2018.
  35. High-dimensional regression in practice: an empirical study of finite-sample prediction, variable selection and ranking. Statistics and Computing, 30:697–719, 2020.
  36. Tumor classification by combining PNN classifier ensemble with neighborhood rough set based gene reduction. Computers in Biology and Medicine, 40(2):179–189, 2010.
  37. T. Werner. Loss-guided stability selection. Advances in Data Analysis and Classification, pages 1–26, 2023.
  38. RBP EIF2S2 promotes tumorigenesis and progression by regulating MYC-mediated inhibition via FHIT-related enhancers. Molecular Therapy, 28(4):1105–1118, 2020.
  39. Patient risk prediction model via top-k stability selection. In Proceedings of the 2013 SIAM International Conference on Data Mining, pages 55–63. SIAM, 2013.
  40. H. Zou. The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101(476):1418–1429, 2006.
  41. H. Zou and T. Hastie. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B: Statistical Methodology, 67(2):301–320, 2005.
Citations (1)

Summary

We haven't generated a summary for this paper yet.