Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Nonparametric Bayesian Knockoff Generators for Feature Selection Under Complex Data Structure (2111.06985v2)

Published 12 Nov 2021 in stat.ME

Abstract: The recent proliferation of high-dimensional data, such as electronic health records and genetics data, offers new opportunities to find novel predictors of outcomes. Presented with a large set of candidate features, interest often lies in selecting the ones most likely to be predictive of an outcome for further study. Controlling the false discovery rate (FDR) at a specified level is often desired in evaluating these variables. Knockoff filtering is an innovative strategy for conducting FDR-controlled feature selection. This paper proposes a nonparametric Bayesian model for generating high-quality knockoff copies that can improve the accuracy of predictive feature identification for variables arising from complex distributions, which can be skewed, highly dispersed and/or a mixture of distributions. This paper provides a detailed description for generating knockoff copies from a GDPM model via MCMC posterior sampling. Additionally, we provide a theoretical guarantee on the robustness of the knockoff procedure. Through simulations, the method is shown to identify important features with accurate FDR control and improved power over the popular second-order Gaussian knockoff generator. Furthermore, the model is compared with finite Gaussian mixture knockoff generator in FDR and power. The proposed technique is applied for detecting genes predictive of survival in ovarian cancer patients using data from The Cancer Genome Atlas (TCGA).

Summary

We haven't generated a summary for this paper yet.