Feature Selection as Deep Sequential Generative Learning (2403.03838v1)
Abstract: Feature selection aims to identify the most pattern-discriminative feature subset. In prior literature, filter (e.g., backward elimination) and embedded (e.g., Lasso) methods have hyperparameters (e.g., top-K, score thresholding) and tie to specific models, thus, hard to generalize; wrapper methods search a feature subset in a huge discrete space and is computationally costly. To transform the way of feature selection, we regard a selected feature subset as a selection decision token sequence and reformulate feature selection as a deep sequential generative learning task that distills feature knowledge and generates decision sequences. Our method includes three steps: (1) We develop a deep variational transformer model over a joint of sequential reconstruction, variational, and performance evaluator losses. Our model can distill feature selection knowledge and learn a continuous embedding space to map feature selection decision sequences into embedding vectors associated with utility scores. (2) We leverage the trained feature subset utility evaluator as a gradient provider to guide the identification of the optimal feature subset embedding;(3) We decode the optimal feature subset embedding to autoregressively generate the best feature selection decision sequence with autostop. Extensive experimental results show this generative perspective is effective and generic, without large discrete search space and expert-specific hyperparameters.
- Panning for gold: Model-free knockoffs for high-dimensional controlled variable selection. Vol. 1610. Department of Statistics, Stanford University Stanford, CA, USA.
- A novel feature selection based on one-way anova f-test for e-mail spam classification. Research Journal of Applied Sciences, Engineering and Technology 7, 3 (2014), 625–638.
- Autogfs: Automated group-based feature selection via interactive reinforcement learning. In Proceedings of the 2021 SIAM International Conference on Data Mining (SDM). SIAM, 342–350.
- George Forman et al. 2003. An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, Mar (2003), 1289–1305.
- Recursive feature elimination with random forest for PTR-MS analysis of agroindustrial products. Chemometrics and intelligent laboratory systems 83, 2 (2006), 83–90.
- Mark A Hall. 1999. Feature selection for discrete and numeric class machine learning. (1999).
- Heterogeneous ensemble combination search using genetic algorithm for class imbalanced data classification. PloS one 11, 1 (2016), e0146116.
- Ensemble of feature selection algorithms: a multi-criteria decision-making approach. International Journal of Machine Learning and Cybernetics 13, 1 (2022), 49–69.
- Laplacian score for feature selection. Advances in neural information processing systems 18 (2005).
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.
- C2IMUFS: Complementary and Consensus Learning-Based Incomplete Multi-View Unsupervised Feature Selection. IEEE Transactions on Knowledge and Data Engineering 35, 10 (2023), 10681–10694. https://doi.org/10.1109/TKDE.2023.3266595
- Rianne Hupse and Nico Karssemeijer. 2010. The effect of feature selection methods on computer-aided detection of masses in mammograms. Physics in Medicine & Biology 55, 10 (2010), 2893.
- Alexei Ivanov and Giuseppe Riccardi. 2012. Kolmogorov-Smirnov test for feature selection in emotion recognition from speech. In 2012 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 5125–5128.
- KnockoffGAN: Generating knockoffs for feature selection using generative adversarial networks. In International conference on learning representations.
- Feature selection in unsupervised learning via evolutionary search. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. 365–369.
- Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).
- Ron Kohavi and George H John. 1997. Wrappers for feature subset selection. Artificial intelligence 97, 1-2 (1997), 273–324.
- Riccardo Leardi. 1996. Genetic algorithms in feature selection. In Genetic algorithms in molecular modeling. Elsevier, 67–86.
- Lassonet: Neural networks with feature sparsity. In International Conference on Artificial Intelligence and Statistics. PMLR, 10–18.
- Feature selection: A data perspective. ACM Computing Surveys (CSUR) 50, 6 (2017), 1–45.
- Deep Neural Networks for High Dimension, Low Sample Size Data.. In IJCAI. 2287–2293.
- Automating feature subspace exploration via multi-agent reinforcement learning. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 207–215.
- Interactive reinforced feature selection with traverse strategy. Knowledge and Information Systems 65, 5 (2023), 1935–1962.
- Efficient Reinforced Feature Selection via Early Stopping Traverse Strategy. In 2021 IEEE International Conference on Data Mining (ICDM). IEEE, 399–408.
- Daily activity feature selection in smart homes based on pearson correlation coefficient. Neural Processing Letters 51 (2020), 1771–1787.
- DeepPINK: reproducible feature selection in deep neural networks. Advances in neural information processing systems 31 (2018).
- Human-level control through deep reinforcement learning. nature 518, 7540 (2015), 529–533.
- Patrenahalli M. Narendra and Keinosuke Fukunaga. 1977. A branch and bound algorithm for feature subset selection. IEEE Transactions on computers 9 (1977), 917–922.
- Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on pattern analysis and machine intelligence 27, 8 (2005), 1226–1238.
- Exploiting the ensemble paradigm for stable feature selection: a case study on high-dimensional genomic data. Information Fusion 35 (2017), 132–147.
- Wilcoxon signed rank based feature selection for sentiment classification. In Proceedings of the Second International Conference on Computational Intelligence and Informatics: ICCII 2017. Springer, 293–310.
- Testing different ensemble configurations for feature selection. Neural Processing Letters 46, 3 (2017), 857–880.
- On developing an automatic threshold applied to feature selection ensembles. Information Fusion 45 (2019), 227–245.
- Ensemble feature selection: homogeneous and heterogeneous approaches. Knowledge-Based Systems 118 (2017), 124–139.
- Feature selection using decision tree and classification through proximal support vector machine for fault diagnostics of roller bearing. Mechanical systems and signal processing 21, 2 (2007), 930–942.
- Ikram Sumaiya Thaseen and Cherukuri Aswani Kumar. 2017. Intrusion detection model using fusion of chi-square feature selection and multi class SVM. Journal of King Saud University-Computer and Information Sciences 29, 4 (2017), 462–472.
- Robert Tibshirani. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58, 1 (1996), 267–288.
- Attention is all you need. Advances in neural information processing systems 30 (2017).
- Jihoon Yang and Vasant Honavar. 1998. Feature subset selection using a genetic algorithm. In Feature extraction, construction and selection. Springer, 117–136.
- Yiming Yang and Jan O Pedersen. 1997. A comparative study on feature selection in text categorization. In Icml, Vol. 97. Nashville, TN, USA, 35.
- Lei Yu and Huan Liu. 2003. Feature selection for high-dimensional data: A fast correlation-based filter solution. In Proceedings of the 20th international conference on machine learning (ICML-03). 856–863.
- Nina Zhou and Lipo Wang. 2007. A modified T-test feature selection method and its application on the HapMap genotype data. Genomics, proteomics & bioinformatics 5, 3-4 (2007), 242–249.