Papers
Topics
Authors
Recent
2000 character limit reached

Contextual Feature Selection with Conditional Stochastic Gates (2312.14254v2)

Published 21 Dec 2023 in cs.LG and cs.NE

Abstract: Feature selection is a crucial tool in machine learning and is widely applied across various scientific disciplines. Traditional supervised methods generally identify a universal set of informative features for the entire population. However, feature relevance often varies with context, while the context itself may not directly affect the outcome variable. Here, we propose a novel architecture for contextual feature selection where the subset of selected features is conditioned on the value of context variables. Our new approach, Conditional Stochastic Gates (c-STG), models the importance of features using conditional Bernoulli variables whose parameters are predicted based on contextual variables. We introduce a hypernetwork that maps context variables to feature selection parameters to learn the context-dependent gates along with a prediction model. We further present a theoretical analysis of our model, indicating that it can improve performance and flexibility over population-level methods in complex feature selection settings. Finally, we conduct an extensive benchmark using simulated and real-world datasets across multiple domains demonstrating that c-STG can lead to improved feature selection capabilities while enhancing prediction accuracy and interpretability.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Contextual explanation networks, 2020.
  2. Genevera I Allen. Automatic feature selection via weighted kernels and regularization. Journal of Computational and Graphical Statistics, 22(2):284–299, 2013.
  3. Context-based splitting of item ratings in collaborative filtering. In Proceedings of the third ACM conference on Recommender systems, pages 245–248, 2009.
  4. Roberto Battiti. Using mutual information for selecting features in supervised neural net learning. IEEE Transactions on neural networks, 5(4):537–550, 1994.
  5. The trimmed lasso: Sparsity and robustness, 2017.
  6. Kernel feature selection via conditional covariance minimization. In Advances in Neural Information Processing Systems, pages 6946–6955, 2017.
  7. Learning to explain: An information-theoretic perspective on model interpretation. In International conference on machine learning, pages 883–892. PMLR, 2018.
  8. Learning to maximize mutual information for dynamic feature selection. In International Conference on Machine Learning, pages 6424–6447. PMLR, 2023.
  9. Iteratively re-weighted least squares minimization for sparse recovery, 2008.
  10. Normalized mutual information feature selection. IEEE Transactions on Neural Networks, 20(2):189–201, 2009.
  11. Implicit reparameterization gradients. In Advances in Neural Information Processing Systems, pages 441–452, 2018.
  12. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pages 249–256. JMLR Workshop and Conference Proceedings, 2010.
  13. Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review. Journal of the American Medical Informatics Association: JAMIA, 24(1):198, 2017.
  14. Chris Hans. Bayesian Lasso regression. Biometrika, 96(4):835–845, 2009.
  15. Locality preserving projections. In Advances in neural information processing systems, pages 153–160, 2004.
  16. A comprehensive survey on the process, methods, evaluation, and challenges of feature selection. IEEE Access, 2022.
  17. Support recovery with stochastic gates: Theory and application for linear models. arXiv preprint arXiv:2110.15960, 2021.
  18. Categorical reparameterization with Gumbel-softmax. arXiv preprint arXiv:1611.01144, 2016.
  19. Categorical reparameterization with Gumbel-Softmax. arXiv preprint arXiv:1611.01144, 2017.
  20. Heart Disease. UCI Machine Learning Repository, 1988. DOI: 10.24432/C52P4X.
  21. Wrappers for feature subset selection. Artificial intelligence, 97(1-2):273–324, 1997.
  22. Feature selection: A literature review. Smart Comput. Rev., 4:211–229, 2014.
  23. Cell-type-specific outcome representation in the primary motor cortex. Neuron, 107(5):954–971, 2020.
  24. David D Lewis. Feature selection and feature extraction for text categorization. In Speech and Natural Language: Proceedings of a Workshop Held at Harriman, New York, February 23-26, 1992, 1992.
  25. From lasso regression to feature vector machine. In Advances in Neural Information Processing Systems, pages 779–786, 2006.
  26. Feature selection. ACM Computing Surveys, 50(6):1–45, dec 2017. doi: 10.1145/3136625. URL https://doi.org/10.1145%2F3136625.
  27. IsoLasso: a LASSO regression approach to RNA-Seq based transcriptome assembly. In International Conference on Research in Computational Molecular Biology, pages 168–188. Springer, 2011.
  28. Lianjia. Housing price in Beijing, 2017. URL https://www.kaggle.com/datasets/ruiqurm/lianjia.
  29. Probabilistic robust autoencoders for outlier detection. arXiv preprint arXiv:2110.00494, 2021a.
  30. L0-sparse canonical correlation analysis. In International Conference on Learning Representations, 2021b.
  31. Differentiable unsupervised feature selection based on a gated laplacian. Advances in Neural Information Processing Systems, 34:1530–1542, 2021c.
  32. How to read articles that use machine learning: users’ guides to the medical literature. Jama, 322(18):1806–1816, 2019.
  33. Learning sparse neural networks through l⁢_⁢0𝑙_0l\_0italic_l _ 0 regularization. arXiv preprint arXiv:1712.01312, 2017.
  34. The concrete distribution: A continuous relaxation of discrete random variables. arXiv preprint arXiv:1611.00712, 2016.
  35. Reducing reparameterization gradient variance. In Advances in Neural Information Processing Systems, pages 3708–3718, 2017.
  36. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on pattern analysis and machine intelligence, 27(8):1226–1238, 2005.
  37. Juha Reunanen. Overfitting in making comparisons between variable selection methods. Journal of Machine Learning Research, 3(Mar):1371–1382, 2003.
  38. Deep unsupervised feature selection by discarding nuisance and correlated features. Neural Networks, 152:34–43, 2022.
  39. Joint active feature acquisition and classification with variable-size set encoding. Advances in neural information processing systems, 31, 2018.
  40. Supervised feature selection via dependence estimation. In Proceedings of the 24th international conference on Machine learning, pages 823–830. ACM, 2007.
  41. Feature selection via dependence maximization. Journal of Machine Learning Research, 13(May):1393–1434, 2012.
  42. Disc: Differential spectral clustering of features. In Advances in Neural Information Processing Systems, 2022.
  43. Decision tree classifier for network intrusion detection with ga-based feature selection. In Proceedings of the 43rd annual Southeast regional conference-Volume 2, pages 136–141. ACM, 2005.
  44. The contextual lasso: Sparse linear models via deep neural networks, 2023.
  45. Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 267–288, 1996.
  46. Feature selection using stochastic gates. In International Conference on Machine Learning, pages 10648–10659. PMLR, 2020.
  47. Locally sparse neural networks for tabular biomedical data. In International Conference on Machine Learning, pages 25123–25153. PMLR, 2022.
  48. Probabilistic best subset selection via gradient-based optimization, 2022.
  49. Invase: Instance-wise variable selection using neural networks. In International Conference on Learning Representations, 2018.
  50. Wrapper–filter feature selection algorithm using a memetic framework. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 37(1):70–76, 2007.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.