Neuro-Symbolic Embedding for Short and Effective Feature Selection via Autoregressive Generation (2404.17157v1)
Abstract: Feature selection aims to identify the optimal feature subset for enhancing downstream models. Effective feature selection can remove redundant features, save computational resources, accelerate the model learning process, and improve the model overall performance. However, existing works are often time-intensive to identify the effective feature subset within high-dimensional feature spaces. Meanwhile, these methods mainly utilize a single downstream task performance as the selection criterion, leading to the selected subsets that are not only redundant but also lack generalizability. To bridge these gaps, we reformulate feature selection through a neuro-symbolic lens and introduce a novel generative framework aimed at identifying short and effective feature subsets. More specifically, we found that feature ID tokens of the selected subset can be formulated as symbols to reflect the intricate correlations among features. Thus, in this framework, we first create a data collector to automatically collect numerous feature selection samples consisting of feature ID tokens, model performance, and the measurement of feature subset redundancy. Building on the collected data, an encoder-decoder-evaluator learning paradigm is developed to preserve the intelligence of feature selection into a continuous embedding space for efficient search. Within the learned embedding space, we leverage a multi-gradient search algorithm to find more robust and generalized embeddings with the objective of improving model performance and reducing feature subset redundancy. These embeddings are then utilized to reconstruct the feature ID tokens for executing the final feature selection. Ultimately, comprehensive experiments and case studies are conducted to validate the effectiveness of the proposed framework.
- Wasserstein generative adversarial networks. In International conference on machine learning. PMLR, 214–223.
- Jean-Pierre Briot. 2021. From artificial neural networks to deep learning for music generation: history, concepts and trends. Neural Computing and Applications 33, 1 (2021), 39–65.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
- Feature selection on supervised classification using Wilks lambda statistic. In 2007 International Symposium on Computational Intelligence and Intelligent Informatics. IEEE, 51–55.
- Autogfs: Automated group-based feature selection via interactive reinforcement learning. In Proceedings of the 2021 SIAM International Conference on Data Mining (SDM). SIAM, 342–350.
- Using reinforcement learning to find an optimal set of features. Computers & Mathematics with Applications 66, 10 (2013), 1892–1904.
- George Forman et al. 2003. An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, Mar (2003), 1289–1305.
- Generative adversarial nets. Advances in neural information processing systems 27 (2014).
- Recursive feature elimination with random forest for PTR-MS analysis of agroindustrial products. Chemometrics and intelligent laboratory systems 83, 2 (2006), 83–90.
- Mark A Hall. 1999. Feature selection for discrete and numeric class machine learning. (1999).
- Ensemble of feature selection algorithms: a multi-criteria decision-making approach. International Journal of Machine Learning and Cybernetics 13, 1 (2022), 49–69.
- Laplacian score for feature selection. Advances in neural information processing systems 18 (2005).
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.
- Analysis of user behavior in a large-scale VoD system. In Proceedings of the 27th Workshop on Network and Operating Systems Support for Digital Audio and Video. 49–54.
- IMUFS: Complementary and Consensus Learning-Based Incomplete Multi-View Unsupervised Feature Selection. IEEE Transactions on Knowledge and Data Engineering (2023).
- Feature selection in unsupervised learning via evolutionary search. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. 365–369.
- Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).
- Ron Kohavi and George H John. 1997. Wrappers for feature subset selection. Artificial intelligence 97, 1-2 (1997), 273–324.
- Mark Kroon and Shimon Whiteson. 2009. Automatic feature selection for model-based reinforcement learning in factored MDPs. In 2009 International Conference on Machine Learning and Applications. IEEE, 324–330.
- Evolution of neural text generation: Comparative analysis. In Advances in Computer, Communication and Computational Sciences: Proceedings of IC4S 2019. Springer, 795–804.
- Lassonet: Neural networks with feature sparsity. In International conference on artificial intelligence and statistics. PMLR, 10–18.
- Optimus: Organizing sentences via pre-trained modeling of a latent space. arXiv preprint arXiv:2004.04092 (2020).
- Deep feature selection: theory and application to identify enhancers and promoters. Journal of Computational Biology 23, 5 (2016), 322–336.
- Automating feature subspace exploration via multi-agent reinforcement learning. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 207–215.
- Interactive reinforced feature selection with traverse strategy. Knowledge and Information Systems 65, 5 (2023), 1935–1962.
- Efficient reinforced feature selection via early stopping traverse strategy. In 2021 IEEE International Conference on Data Mining (ICDM). IEEE, 399–408.
- Comparing correlated correlation coefficients. Psychological bulletin 111, 1 (1992), 172.
- Human-level control through deep reinforcement learning. nature 518, 7540 (2015), 529–533.
- Narendra and Fukunaga. 1977. A branch and bound algorithm for feature subset selection. IEEE Transactions on computers 100, 9 (1977), 917–922.
- Safs: A deep feature selection approach for precision medicine. In 2016 IEEE international conference on bioinformatics and biomedicine (BIBM). IEEE, 501–506.
- A predictive approach using deep feature learning for electronic medical records: A comparative study. arXiv preprint arXiv:1801.02961 (2018).
- Donald B Owen. 1965. The power of Student’s t-test. J. Amer. Statist. Assoc. 60, 309 (1965), 320–333.
- Seongmin Park and Jihwa Lee. 2021. Finetuning pretrained transformers into variational autoencoders. arXiv preprint arXiv:2108.02446 (2021).
- Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on pattern analysis and machine intelligence 27, 8 (2005), 1226–1238.
- Improving language understanding by generative pre-training. (2018).
- Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research 21, 1 (2020), 5485–5551.
- Learning important features through propagating activation differences. In International conference on machine learning. PMLR, 3145–3153.
- Analysis of variance (ANOVA). Chemometrics and intelligent laboratory systems 6, 4 (1989), 259–272.
- Feature selection using decision tree and classification through proximal support vector machine for fault diagnostics of roller bearing. Mechanical systems and signal processing 21, 2 (2007), 930–942.
- Huaining Sun and Xuegang Hu. 2017. Attribute selection for decision tree learning with class constraint. Chemometrics and Intelligent Laboratory Systems 163 (2017), 16–23.
- Richard S Sutton and Andrew G Barto. 1999. Reinforcement learning: An introduction. Robotica 17, 2 (1999), 229–235.
- Robert Tibshirani. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology 58, 1 (1996), 267–288.
- Attention is all you need. Advances in neural information processing systems 30 (2017).
- Group-wise reinforcement feature generation for optimal and explainable representation space reconstruction. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1826–1834.
- Reinforcement-enhanced autoregressive feature transformation: Gradient-steered search in continuous space for postfix expressions. Advances in Neural Information Processing Systems 36 (2024).
- Knockoff-Guided Feature Selection via A Single Pre-trained Reinforced Agent. arXiv preprint arXiv:2403.04015 (2024).
- Traceable group-wise self-optimizing feature transformation learning: A dual optimization perspective. ACM Transactions on Knowledge Discovery from Data 18, 4 (2024), 1–22.
- Traceable automatic feature transformation via cascading actor-critic agents. In Proceedings of the 2023 SIAM International Conference on Data Mining (SDM). SIAM, 775–783.
- Beyond discrete selection: Continuous embedding space optimization for generative feature selection. In 2023 IEEE International Conference on Data Mining (ICDM). IEEE, 688–697.
- Jihoon Yang and Vasant Honavar. 1998. Feature subset selection using a genetic algorithm. IEEE Intelligent Systems and their Applications 13, 2 (1998), 44–49.
- Yiming Yang and Jan O Pedersen. 1997. A comparative study on feature selection in text categorization. In Icml, Vol. 97. Nashville, TN, USA, 35.
- Feature Selection as Deep Sequential Generative Learning. arXiv preprint arXiv:2403.03838 (2024).
- Lei Yu and Huan Liu. 2003. Feature selection for high-dimensional data: A fast correlation-based filter solution. In Proceedings of the 20th international conference on machine learning (ICML-03). 856–863.