Sparse-Input Neural Network using Group Concave Regularization (2307.00344v1)
Abstract: Simultaneous feature selection and non-linear function estimation are challenging, especially in high-dimensional settings where the number of variables exceeds the available sample size in modeling. In this article, we investigate the problem of feature selection in neural networks. Although the group LASSO has been utilized to select variables for learning with neural networks, it tends to select unimportant variables into the model to compensate for its over-shrinkage. To overcome this limitation, we propose a framework of sparse-input neural networks using group concave regularization for feature selection in both low-dimensional and high-dimensional settings. The main idea is to apply a proper concave penalty to the $l_2$ norm of weights from all outgoing connections of each input node, and thus obtain a neural net that only uses a small subset of the original variables. In addition, we develop an effective algorithm based on backward path-wise optimization to yield stable solution paths, in order to tackle the challenge of complex optimization landscapes. Our extensive simulation studies and real data examples demonstrate satisfactory finite sample performances of the proposed estimator, in feature selection and prediction for modeling continuous, binary, and time-to-event outcomes.
- Genomic correlates of clinical outcome in advanced prostate cancer. Proceedings of the National Academy of Sciences, 116(23):11428–11436, 2019.
- Molecular characterization of neuroendocrine prostate cancer and identification of new drug targets. Cancer discovery, 1(6):487–495, 2011.
- Impact of therapy on genomics and transcriptomics in high-risk prostate cancer treated with neoadjuvant docetaxel and androgen deprivation therapymolecular analysis high-risk pca after neoadjuvant therapy. Clinical Cancer Research, 23(22):6802–6811, 2017.
- Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. The annals of applied statistics, 5(1):232, 2011.
- Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. In 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 4960–4964. IEEE, 2016.
- Comprehensive profiling of the androgen receptor in liquid biopsies from castration-resistant prostate cancer reveals novel intra-ar structural variation and splice variant expression patterns. European urology, 72(2):192–200, 2017.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- Consistent feature selection for neural networks via adaptive group lasso. arXiv preprint arXiv:2006.00334, 2020.
- Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association, 96(456):1348–1360, 2001.
- Sparse-input neural networks for high-dimensional nonparametric regression and classification. arXiv preprint arXiv:1711.07592, 2017.
- Deep learning with long short-term memory networks for financial market predictions. European journal of operational research, 270(2):654–669, 2018.
- Pathwise coordinate optimization. The annals of applied statistics, 1(2):302–332, 2007.
- Regularization paths for generalized linear models via coordinate descent. Journal of statistical software, 33(1):1, 2010.
- Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pages 249–256. JMLR Workshop and Conference Proceedings, 2010.
- Speech recognition with deep recurrent neural networks. In 2013 IEEE international conference on acoustics, speech and signal processing, pages 6645–6649. Ieee, 2013.
- Generalized fisher score for feature selection. arXiv preprint arXiv:1202.3725, 2012.
- Model selection and estimation in high dimensional regression models with group scad. Statistics & Probability Letters, 103:86–92, 2015.
- An introduction to variable and feature selection. Journal of machine learning research, 3(Mar):1157–1182, 2003.
- Updated prognostic model for predicting overall survival in first-line chemotherapy for patients with metastatic castration-resistant prostate cancer. Journal of Clinical Oncology, 32(7):671, 2014.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Pharmacogenetic discovery in calgb (alliance) 90401 and mechanistic validation of a vac14 polymorphism that increases risk of docetaxel-induced neuropathyvac14 snp predicts docetaxel-induced neuropathy. Clinical Cancer Research, 22(19):4890–4900, 2016.
- A selective review of group selection in high-dimensional models. Statistical science: a review journal of the Institute of Mathematical Statistics, 27(4), 2012.
- The mnet method for variable selection. Statistica Sinica, pages 903–923, 2016.
- Filter versus wrapper gene selection approaches in dna microarray domains. Artificial intelligence in medicine, 31(2):91–103, 2004.
- High dimensional variable selection with error control. BioMed research international, 2016, 2016.
- Wrappers for feature subset selection. Artificial intelligence, 97(1-2):273–324, 1997.
- Toward optimal feature selection. Technical report, Stanford InfoLab, 1996.
- Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60(6):84–90, 2017.
- Lassonet: A neural network with feature sparsity. The Journal of Machine Learning Research, 22(1):5633–5661, 2021.
- Deep feature selection: theory and application to identify enhancers and promoters. Journal of Computational Biology, 23(5):322–336, 2016.
- Yi Lin and Hao Helen Zhang. Component selection and smoothing in multivariate nonparametric regression. The Annals of Statistics, 34(5):2272 – 2297, 2006. doi: 10.1214/009053606000000722. URL https://doi.org/10.1214/009053606000000722.
- Deep neural networks for high dimension, low sample size data. In IJCAI, pages 2287–2293, 2017.
- Dna-repair defects and olaparib in metastatic prostate cancer. New England Journal of Medicine, 373(18):1697–1708, 2015.
- High-dimensional additive modeling. The Annals of Statistics, 37(6B):3779–3821, 2009.
- Concurrent aurka and mycn gene amplifications are harbingers of lethal treatmentrelated neuroendocrine prostate cancer. Neoplasia, 15(1):1–IN4, 2013.
- Yu Nesterov. Gradient methods for minimizing composite functions. Mathematical Programming, 140(1):125–161, 2013.
- Sparse additive models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(5):1009–1030, 2009.
- Integrative clinical genomics of advanced prostate cancer. Cell, 161(5):1215–1228, 2015.
- Group sparse regularization for deep neural networks. Neurocomputing, 241:81–89, 2017.
- Feature selection for classification: A review. Data classification: Algorithms and applications, page 37, 2014.
- Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 267–288, 1996.
- Evaluating prediction rules for t-year survivors with censored regression models. Journal of the American Statistical Association, 102(478):527–537, 2007.
- Five years of gwas discovery. The American Journal of Human Genetics, 90(1):7–24, 2012.
- Genomic alterations in cell-free dna and enzalutamide resistance in castration-resistant prostate cancer. JAMA oncology, 2(12):1598–1606, 2016.
- High-dimensional feature selection by feature-wise kernelized lasso. Neural computation, 26(1):185–207, 2014.
- Feature selection using stochastic gates. In International Conference on Machine Learning, pages 10648–10659. PMLR, 2020.
- Recent trends in deep learning based natural language processing. ieee Computational intelligenCe magazine, 13(3):55–75, 2018.
- Ming Yuan and Yi Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(1):49–67, 2006.
- Group variable selection via scad-l 2. Statistics, 48(1):49–66, 2014.
- Cun-Hui Zhang et al. Nearly unbiased variable selection under minimax concave penalty. The Annals of statistics, 38(2):894–942, 2010.
- Hui Zou. The adaptive lasso and its oracle properties. Journal of the American statistical association, 101(476):1418–1429, 2006.
- Regularization and variable selection via the elastic net. Journal of the royal statistical society: series B (statistical methodology), 67(2):301–320, 2005.
- Bin Luo (209 papers)
- Susan Halabi (1 paper)