Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DeepDRK: Deep Dependency Regularized Knockoff for Feature Selection (2402.17176v2)

Published 27 Feb 2024 in cs.LG

Abstract: Model-X knockoff has garnered significant attention among various feature selection methods due to its guarantees for controlling the false discovery rate (FDR). Since its introduction in parametric design, knockoff techniques have evolved to handle arbitrary data distributions using deep learning-based generative models. However, we have observed limitations in the current implementations of the deep Model-X knockoff framework. Notably, the "swap property" that knockoffs require often faces challenges at the sample level, resulting in diminished selection power. To address these issues, we develop "Deep Dependency Regularized Knockoff (DeepDRK)," a distribution-free deep learning method that effectively balances FDR and power. In DeepDRK, we introduce a novel formulation of the knockoff model as a learning problem under multi-source adversarial attacks. By employing an innovative perturbation technique, we achieve lower FDR and higher power. Our model outperforms existing benchmarks across synthetic, semi-synthetic, and real-world datasets, particularly when sample sizes are small and data distributions are non-Gaussian.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (62)
  1. Data denoising and post-denoising corrections in single cell rna sequencing. 2020.
  2. Gut microbiome function predicts response to anti-integrin biologic therapy in inflammatory bowel diseases. Cell host & microbe, 21(5):603–610, 2017.
  3. Invariant risk minimization. arXiv preprint arXiv:1907.02893, 2019.
  4. Wasserstein generative adversarial networks. In International conference on machine learning, pages 214–223. PMLR, 2017.
  5. Controlling the false discovery rate via knockoffs. 2015.
  6. Metabolomics as a promising resource identifying potential biomarkers for inflammatory bowel disease. Journal of Clinical Medicine, 10(4):622, 2021.
  7. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal statistical society: series B (Methodological), 57(1):289–300, 1995.
  8. Utilizing machine learning with knockoff filtering to extract significant metabolites in crohn’s disease with a publicly available untargeted metabolomics dataset. Plos one, 16(7):e0255240, 2021.
  9. Mechanism of allopurinol induced tpmt inhibition. Biochemical pharmacology, 86(4):539–547, 2013.
  10. Sliced and radon wasserstein barycenters of measures. Journal of Mathematical Imaging and Vision, 51:22–45, 2015.
  11. Nicolas Bonnotte. Unidimensional and evolution methods for optimal transportation. PhD thesis, Université Paris Sud-Paris XI; Scuola normale superiore (Pise, Italie), 2013.
  12. Panning for gold:‘model-x’knockoffs for high dimensional controlled variable selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 80(3):551–577, 2018.
  13. Statistical inference. Cengage Learning, 2021.
  14. Max-sliced wasserstein distance and its use for gans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10648–10656, 2019.
  15. Generative modeling using the sliced wasserstein distance. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3483–3491, 2018.
  16. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  17. Ark: Robust knockoffs inference with coupling. arXiv preprint arXiv:2307.04400, 2023.
  18. Colonic inflammation in the rabbit induced by phorbol-12-myristate-13-acetate. Inflammation, 14(2):143–150, 1990.
  19. An adaptive step-down procedure with proven fdr control under independence. The Annals of Statistics, 37(2):619–629, 2009.
  20. Knockoffs for the mass: new feature importance statistics with false discovery guarantees. In The 22nd international conference on artificial intelligence and statistics, pages 2125–2133. PMLR, 2019.
  21. Generative adversarial networks. Communications of the ACM, 63(11):139–144, 2020.
  22. An introduction to variable and feature selection. Journal of machine learning research, 3(Mar):1157–1182, 2003.
  23. Normalizing flows for knockoff-free controlled feature selection. Advances in Neural Information Processing Systems, 35:16125–16137, 2022.
  24. Identification of putative causal loci in whole-genome sequencing data via knockoff statistics. Nature communications, 12(1):1–18, 2021.
  25. Direct estimation of the index coefficient in a single-index model. Annals of Statistics, pages 595–623, 2001.
  26. Categorical reparameterization with gumbel-softmax. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, 2017.
  27. Knockoffgan: Generating knockoffs for feature selection using generative adversarial networks. In International Conference on Learning Representations, 2018.
  28. Generalized sliced wasserstein distances. Advances in neural information processing systems, 32, 2019.
  29. Hon Wai Koon. A novel orally active metabolite reverses crohn’s disease-associated intestinal fibrosis. Inflammatory Bowel Diseases, 28(Supplement_1):S61–S62, 2022.
  30. Out-of-distribution generalization via risk extrapolation (rex). In International Conference on Machine Learning, pages 5815–5826. PMLR, 2021.
  31. Gut microbiota-derived metabolites as key actors in inflammatory bowel disease. Nature reviews Gastroenterology & hepatology, 17(4):223–237, 2020.
  32. Oral versus intravenous iron replacement therapy distinctly alters the gut microbiota and metabolome in patients with ibd. Gut, 66(5):863–871, 2017.
  33. Scalable model-free feature screening via sliced-wasserstein dependency. Journal of Computational and Graphical Statistics, pages 1–11, 2023.
  34. Auto-encoding knockoff generator for fdr controlled variable selection. arXiv preprint arXiv:1809.10765, 2018.
  35. Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases. Nature, 569(7758):655–662, 2019.
  36. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  37. Deeppink: reproducible feature selection in deep neural networks. Advances in neural information processing systems, 31, 2018.
  38. Multivariate rank via entropic optimal transport: sample efficiency and generative modeling. arXiv preprint arXiv:2111.00043, 2021.
  39. Slco1b1* 15 allele is associated with methotrexate-induced nausea in pediatric patients with inflammatory bowel disease. Clinical and translational science, 15(1):63–69, 2022.
  40. Serotonin synthesis and uptake in symptomatic patients with crohn’s disease in remission. Clinical Gastroenterology and Hepatology, 5(6):714–720, 2007.
  41. Inferring intestinal mucosal immune cell associated microbiome species and microbiota-derived metabolites in inflammatory bowel disease. bioRxiv, 2020.
  42. Hierarchical sliced wasserstein distance. arXiv preprint arXiv:2209.13570, 2022.
  43. Transport dependency: Optimal transport based dependency measures. arXiv preprint arXiv:2105.02073, 2021.
  44. Expanding the drug discovery space with predicted metabolite–target interactions. Communications biology, 4(1):1–11, 2021.
  45. Normalizing flows for probabilistic modeling and inference. The Journal of Machine Learning Research, 22(1):2617–2680, 2021.
  46. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
  47. Xiaofa Qin. Etiology of inflammatory bowel disease: a unified hypothesis. World journal of gastroenterology: WJG, 18(15):1708, 2012.
  48. Deep knockoffs. Journal of the American Statistical Association, 115(532):1861–1872, 2020.
  49. Olmesartan ameliorates chemically-induced ulcerative colitis in rats via modulating nfκ𝜅\kappaitalic_κb and nrf-2/ho-1 signaling crosstalk. Toxicology and applied pharmacology, 364:120–132, 2019.
  50. Thorsten Schmidt. Coping with copulas. Copulas-From theory to application in finance, 3:1–34, 2007.
  51. Alterations in lipid, amino acid, and energy metabolism distinguish crohn’s disease from ulcerative colitis and control subjects by serum metabolomic profiling. Metabolomics, 14(1):1–12, 2018.
  52. Gene hunting with knockoffs for hidden markov models. arXiv preprint arXiv:1706.04677, 2017.
  53. Augmented increase in tight junction permeability by luminal stimuli in the non-inflamed ileum of crohn’s disease. Gut, 50(3):307–313, 2002.
  54. Reversible increase in tight junction permeability to macromolecules in rat ileal mucosa in vitro by sodium caprate, a constituent of milk fat. Digestive diseases and sciences, 43(7):1547–1552, 1998.
  55. Powerful knockoffs via minimizing reconstructability. The Annals of Statistics, 50(1):252–276, 2022.
  56. Deep direct likelihood knockoffs. Advances in neural information processing systems, 33:5036–5046, 2020.
  57. Human metabolic individuality in biomedical and pharmaceutical research. Nature, 477(7362):54–60, 2011.
  58. The fatty acid profile of the erythrocyte membrane in initial-onset inflammatory bowel disease patients. Digestive diseases and sciences, 58(5):1235–1243, 2013.
  59. Liver disorders in inflammatory bowel disease. Gastroenterology research and practice, 2012, 2012.
  60. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  61. Cédric Villani et al. Optimal transport: old and new, volume 338. Springer, 2009.
  62. Johannes CW Wiesel. Measuring association with wasserstein distances. Bernoulli, 28(4):2816–2832, 2022.

Summary

We haven't generated a summary for this paper yet.