Deep Feature Screening: Feature Selection for Ultra High-Dimensional Data via Deep Neural Networks (2204.01682v3)
Abstract: The applications of traditional statistical feature selection methods to high-dimension, low sample-size data often struggle and encounter challenging problems, such as overfitting, curse of dimensionality, computational infeasibility, and strong model assumption. In this paper, we propose a novel two-step nonparametric approach called Deep Feature Screening (DeepFS) that can overcome these problems and identify significant features with high precision for ultra high-dimensional, low-sample-size data. This approach first extracts a low-dimensional representation of input data and then applies feature screening based on multivariate rank distance correlation recently developed by Deb and Sen (2021). This approach combines the strengths of both deep neural networks and feature screening, and thereby has the following appealing features in addition to its ability of handling ultra high-dimensional data with small number of samples: (1) it is model free and distribution free; (2) it can be used for both supervised and unsupervised feature selection; and (3) it is capable of recovering the original input data. The superiority of DeepFS is demonstrated via extensive simulation studies and real data analyses.
- Concrete autoencoders for differentiable feature selection and reconstruction. arXiv preprint arXiv:1901.09346.
- On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems. Theoretical Computer Science, 209(1):237–260.
- Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
- Predictable features elimination: An unsupervised approach to feature selection. In International Conference on Machine Learning, Optimization, and Data Science, pages 399–412. Springer.
- Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13(10):281–305.
- A survey on feature selection methods. Computers & Electrical Engineering, 40(1):16–28. 40th-year commemorative issue.
- Variable-length representation for ec-based feature selection in high-dimensional data. In International conference on the applications of evolutionary computation (Part of EvoStar), pages 325–340. Springer.
- Multivariate rank-based distribution-free nonparametric testing using measure transportation. Journal of the American Statistical Association, pages 1–16.
- Ding, C. H. (2003). Unsupervised feature selection via two-way ordering in gene expression analysis. Bioinformatics, 19(10):1259–1266.
- Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(5):849–911.
- Ultrahigh dimensional feature selection: beyond the linear model. The Journal of Machine Learning Research, 10:2013–2038.
- Deep neural networks for estimation and inference. Econometrica, 89(1):181–213.
- Graph autoencoder-based unsupervised feature selection with broad and local data structure preservation. Neurocomputing, 312:310–323.
- Radical inverse quasi-random point sequence, algorithm 247. Commun. ACM, 7(12):701.
- Autoencoder inspired unsupervised feature selection. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2941–2945.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778.
- Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144.
- Drug discovery with explainable artificial intelligence. Nature Machine Intelligence, 2(10):573–584.
- A new local search based hybrid genetic algorithm for feature selection. Neurocomputing, 74(17):2914–2928.
- A survey of feature selection and feature extraction techniques in machine learning. In 2014 Science and Information Conference, pages 372–378.
- Feature selection: a literature review. SmartCR, 4(3):211–229.
- Lassonet: A neural network with feature sparsity. Journal of Machine Learning Research, 22(127):1–29.
- Li, K. (2022). Variable selection for nonlinear cox regression model via deep learning. arXiv preprint arXiv:2211.09287.
- Calibrating multi-dimensional complex ode from noisy data via deep neural networks. arXiv preprint arXiv:2106.03591.
- Semiparametric regression for spatial data via deep learning. arXiv preprint arXiv:2301.03747.
- Feature screening via distance correlation learning. Journal of the American Statistical Association, 107(499):1129–1139.
- Deep feature selection: theory and application to identify enhancers and promoters. Journal of Computational Biology, 23(5):322–336.
- Deep neural networks for high dimension, low sample size data. In IJCAI, pages 2287–2293.
- Optimal nonparametric inference via deep neural network. Journal of Mathematical Analysis and Applications, 505(2):125561.
- On deep instrumental variables estimate.
- A survey on feature selection. Procedia Computer Science, 91:919–926. Promoting Business Analytics and Quantitative Management of Technology: 4th International Conference on Information Technology and Quantitative Management (ITQM 2016).
- Variational relevant sample-feature machine: a fully bayesian approach for embedded feature selection. Neurocomputing, 241:181–190.
- Deep feature selection using a teacher-student network. Neurocomputing, 383:396–408.
- Incremental relevance sample-feature machine: A fast marginal likelihood maximization approach for joint feature selection and classification. Pattern Recognition, 60:835–848.
- Unsupervised feature selection by regularized matrix factorization. Neurocomputing, 273:593–610.
- A review of feature selection techniques in bioinformatics. Bioinformatics, 23(19):2507–2517.
- Group sparse regularization for deep neural networks. Neurocomputing, 241:81–89.
- Schmidt-Hieber, J. (2020). Nonparametric regression using deep neural networks with relu activation function. The Annals of Statistics, 48(4):1875–1897.
- Fsnet: Feature selection network on high-dimensional biological data. arXiv preprint arXiv:2001.08322.
- Practical bayesian optimization of machine learning algorithms. Advances in neural information processing systems, 25.
- Sobol’, I. M. (1967). On the distribution of points in a cube and the approximate evaluation of integrals. Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki, 7(4):784–802.
- A review of unsupervised feature selection methods. Artificial Intelligence Review, 53(2):907–948.
- Deep-fs: A feature selection algorithm for deep boltzmann machines. Neurocomputing, 322:22–37.
- Novel Unsupervised Feature Filtering of Biological Data. Bioinformatics, 22(14):e507–e513.
- Estimation of the mean function of functional data via deep neural networks. Stat, 10(1):e393.
- Genome-wide association analysis by lasso penalized logistic regression. Bioinformatics, 25(6):714–721.
- Ultra high-dimensional nonlinear feature selection for big biological data. IEEE Transactions on Knowledge and Data Engineering, 30(7):1352–1365.
- Prioritizing genetic variants in GWAS with lasso using permutation-assisted tuning. Bioinformatics, 36(12):3811–3817.
- Distribution-free and model-free multivariate feature screening via multivariate rank distance correlation. Journal of Multivariate Analysis, page 105081.
- Co-regularized unsupervised feature selection. Neurocomputing, 275:2855–2863.