Contextual Feature Selection with Conditional Stochastic Gates (2312.14254v2)
Abstract: Feature selection is a crucial tool in machine learning and is widely applied across various scientific disciplines. Traditional supervised methods generally identify a universal set of informative features for the entire population. However, feature relevance often varies with context, while the context itself may not directly affect the outcome variable. Here, we propose a novel architecture for contextual feature selection where the subset of selected features is conditioned on the value of context variables. Our new approach, Conditional Stochastic Gates (c-STG), models the importance of features using conditional Bernoulli variables whose parameters are predicted based on contextual variables. We introduce a hypernetwork that maps context variables to feature selection parameters to learn the context-dependent gates along with a prediction model. We further present a theoretical analysis of our model, indicating that it can improve performance and flexibility over population-level methods in complex feature selection settings. Finally, we conduct an extensive benchmark using simulated and real-world datasets across multiple domains demonstrating that c-STG can lead to improved feature selection capabilities while enhancing prediction accuracy and interpretability.
- Contextual explanation networks, 2020.
- Genevera I Allen. Automatic feature selection via weighted kernels and regularization. Journal of Computational and Graphical Statistics, 22(2):284–299, 2013.
- Context-based splitting of item ratings in collaborative filtering. In Proceedings of the third ACM conference on Recommender systems, pages 245–248, 2009.
- Roberto Battiti. Using mutual information for selecting features in supervised neural net learning. IEEE Transactions on neural networks, 5(4):537–550, 1994.
- The trimmed lasso: Sparsity and robustness, 2017.
- Kernel feature selection via conditional covariance minimization. In Advances in Neural Information Processing Systems, pages 6946–6955, 2017.
- Learning to explain: An information-theoretic perspective on model interpretation. In International conference on machine learning, pages 883–892. PMLR, 2018.
- Learning to maximize mutual information for dynamic feature selection. In International Conference on Machine Learning, pages 6424–6447. PMLR, 2023.
- Iteratively re-weighted least squares minimization for sparse recovery, 2008.
- Normalized mutual information feature selection. IEEE Transactions on Neural Networks, 20(2):189–201, 2009.
- Implicit reparameterization gradients. In Advances in Neural Information Processing Systems, pages 441–452, 2018.
- Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pages 249–256. JMLR Workshop and Conference Proceedings, 2010.
- Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review. Journal of the American Medical Informatics Association: JAMIA, 24(1):198, 2017.
- Chris Hans. Bayesian Lasso regression. Biometrika, 96(4):835–845, 2009.
- Locality preserving projections. In Advances in neural information processing systems, pages 153–160, 2004.
- A comprehensive survey on the process, methods, evaluation, and challenges of feature selection. IEEE Access, 2022.
- Support recovery with stochastic gates: Theory and application for linear models. arXiv preprint arXiv:2110.15960, 2021.
- Categorical reparameterization with Gumbel-softmax. arXiv preprint arXiv:1611.01144, 2016.
- Categorical reparameterization with Gumbel-Softmax. arXiv preprint arXiv:1611.01144, 2017.
- Heart Disease. UCI Machine Learning Repository, 1988. DOI: 10.24432/C52P4X.
- Wrappers for feature subset selection. Artificial intelligence, 97(1-2):273–324, 1997.
- Feature selection: A literature review. Smart Comput. Rev., 4:211–229, 2014.
- Cell-type-specific outcome representation in the primary motor cortex. Neuron, 107(5):954–971, 2020.
- David D Lewis. Feature selection and feature extraction for text categorization. In Speech and Natural Language: Proceedings of a Workshop Held at Harriman, New York, February 23-26, 1992, 1992.
- From lasso regression to feature vector machine. In Advances in Neural Information Processing Systems, pages 779–786, 2006.
- Feature selection. ACM Computing Surveys, 50(6):1–45, dec 2017. doi: 10.1145/3136625. URL https://doi.org/10.1145%2F3136625.
- IsoLasso: a LASSO regression approach to RNA-Seq based transcriptome assembly. In International Conference on Research in Computational Molecular Biology, pages 168–188. Springer, 2011.
- Lianjia. Housing price in Beijing, 2017. URL https://www.kaggle.com/datasets/ruiqurm/lianjia.
- Probabilistic robust autoencoders for outlier detection. arXiv preprint arXiv:2110.00494, 2021a.
- L0-sparse canonical correlation analysis. In International Conference on Learning Representations, 2021b.
- Differentiable unsupervised feature selection based on a gated laplacian. Advances in Neural Information Processing Systems, 34:1530–1542, 2021c.
- How to read articles that use machine learning: users’ guides to the medical literature. Jama, 322(18):1806–1816, 2019.
- Learning sparse neural networks through l_0𝑙_0l\_0italic_l _ 0 regularization. arXiv preprint arXiv:1712.01312, 2017.
- The concrete distribution: A continuous relaxation of discrete random variables. arXiv preprint arXiv:1611.00712, 2016.
- Reducing reparameterization gradient variance. In Advances in Neural Information Processing Systems, pages 3708–3718, 2017.
- Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on pattern analysis and machine intelligence, 27(8):1226–1238, 2005.
- Juha Reunanen. Overfitting in making comparisons between variable selection methods. Journal of Machine Learning Research, 3(Mar):1371–1382, 2003.
- Deep unsupervised feature selection by discarding nuisance and correlated features. Neural Networks, 152:34–43, 2022.
- Joint active feature acquisition and classification with variable-size set encoding. Advances in neural information processing systems, 31, 2018.
- Supervised feature selection via dependence estimation. In Proceedings of the 24th international conference on Machine learning, pages 823–830. ACM, 2007.
- Feature selection via dependence maximization. Journal of Machine Learning Research, 13(May):1393–1434, 2012.
- Disc: Differential spectral clustering of features. In Advances in Neural Information Processing Systems, 2022.
- Decision tree classifier for network intrusion detection with ga-based feature selection. In Proceedings of the 43rd annual Southeast regional conference-Volume 2, pages 136–141. ACM, 2005.
- The contextual lasso: Sparse linear models via deep neural networks, 2023.
- Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 267–288, 1996.
- Feature selection using stochastic gates. In International Conference on Machine Learning, pages 10648–10659. PMLR, 2020.
- Locally sparse neural networks for tabular biomedical data. In International Conference on Machine Learning, pages 25123–25153. PMLR, 2022.
- Probabilistic best subset selection via gradient-based optimization, 2022.
- Invase: Instance-wise variable selection using neural networks. In International Conference on Learning Representations, 2018.
- Wrapper–filter feature selection algorithm using a memetic framework. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 37(1):70–76, 2007.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.