Sparse-group boosting -- Unbiased group and variable selection (2206.06344v2)
Abstract: In the presence of grouped covariates, we propose a framework for boosting that allows to enforce sparsity within and between groups. By using component-wise and group-wise gradient boosting at the same time with adjusted degrees of freedom, a model with similar properties as the sparse group lasso can be fitted through boosting. We show that within-group and between-group sparsity can be controlled by a mixing parameter and discuss similarities and differences to the mixing parameter in the sparse group lasso. With simulations, gene data as well as agricultural data we show the effectiveness and predictive competitiveness of this estimator. The data and simulations suggest, that in the presence of grouped variables the use of sparse group boosting is associated with less biased variable selection and higher predictability compared to component-wise boosting. Additionally, we propose a way of reducing bias in component-wise boosting through the degrees of freedom.
- Naresh Kumar Agarwal “Verifying survey items for construct validity: A two-stage sorting procedure for questionnaire design in information behavior research” In Proceedings of the American Society for Information Science and Technology 48.1, 2011, pp. 1–8 DOI: 10.1002/meet.2011.14504801166
- Leo Breiman “Arcing classifier (with discussion and a rejoinder by the author)” test:Publisher In The Annals of Statistics 26.3, 1998, pp. 801–849 DOI: 10.1214/aos/1024691079
- “Boosting Algorithms: Regularization, Prediction and Model Fitting” In Statistical Science 22.4, 2007, pp. 477–505 DOI: 10.1214/07-STS242
- “Computational Issues” In Semiparametric Regression, Cambridge Series in Statistical and Probabilistic Mathematics Cambridge: Cambridge University Press, 2003, pp. 336–360 DOI: 10.1017/CBO9780511755453.023
- “Bayesian Sparse Group Selection” Publisher: [American Statistical Association, Taylor & Francis, Ltd., Institute of Mathematical Statistics, Interface Foundation of America] In Journal of Computational and Graphical Statistics 25.3, 2016, pp. 665–683 URL: https://www.jstor.org/stable/44861885
- Mohammad Ziaul Islam Chowdhury and Tanvir C Turin “Variable selection strategies and its importance in clinical prediction modelling” In Family Medicine and Community Health 8.1, 2020, pp. e000262 DOI: 10.1136/fmch-2019-000262
- Fatemeh Farokhmanesh and Mohammad Taghi Sadeghi “Deep Feature Selection using an Enhanced Sparse Group Lasso Algorithm” ISSN: 2642-9527 In 2019 27th Iranian Conference on Electrical Engineering (ICEE), 2019, pp. 1549–1552 DOI: 10.1109/IranianCEE.2019.8786386
- Jerome Friedman, Trevor Hastie and Robert Tibshirani “Additive logistic regression: a statistical view of boosting (With discussion and a rejoinder by the authors)” In The Annals of Statistics 28.2, 2000, pp. 337–407 DOI: 10.1214/aos/1016218223
- Jerome H. Friedman “Greedy function approximation: A gradient boosting machine.” In The Annals of Statistics 29.5, 2001, pp. 1189–1232 DOI: 10.1214/aos/1013203451
- ““My Questionnaire is Too Long!” The assessments of motivational-affective constructs with three-item and single-item measures” In Contemporary Educational Psychology 39.3, 2014, pp. 188–205 DOI: 10.1016/j.cedpsych.2014.04.002
- “Approaches to Regularized Regression – A Comparison between Gradient Boosting and the Lasso” In Methods of Information in Medicine 55.5, 2016, pp. 422–430 DOI: 10.3414/ME16-01-0033
- “A Framework for Unbiased Model Selection Based on Boosting” In Journal of Computational and Graphical Statistics 20.4, 2011, pp. 956–971 DOI: 10.1198/jcgs.2011.09220
- Benjamin Hofner, Andreas Mayr and Matthias Schmid “gamboostLSS: An R Package for Model Building and Variable Selection in the GAMLSS Framework” arXiv: 1407.1774 In arXiv:1407.1774 [stat], 2014 URL: http://arxiv.org/abs/1407.1774
- Yasutoshi Ida, Yasuhiro Fujiwara and Hisashi Kashima “Fast Sparse Group Lasso” In Advances in Neural Information Processing Systems 32 Curran Associates, Inc., 2019 URL: https://proceedings.neurips.cc/paper/2019/hash/d240e3d38a8882ecad8633c8f9c78c9b-Abstract.html
- Iain M. Johnstone and D.Michael Titterington “Statistical challenges of high-dimensional data” In Philosophical transactions. Series A, Mathematical, physical, and engineering sciences 367.1906, 2009, pp. 4237–4253 DOI: 10.1098/rsta.2009.0159
- Thomas Kneib, Torsten Hothorn and Gerhard Tutz “Variable Selection and Model Choice in Geoadditive Regression Models” Publisher: John Wiley & Sons, Ltd In Biometrics 65.2, 2009, pp. 626–634 URL: https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1541-0420.2008.01112.x
- Russell Matthews, Wendy Diaz and Steven Cole “The organizational empowerment scale” In Personnel Review 32, 2003, pp. 297–318 DOI: 10.1108/00483480310467624
- Lukas Meier, Sara Van De Geer and Peter Bühlmann “The group lasso for logistic regression” In Journal of the Royal Statistical Society: Series B (Statistical Methodology) 70.1, 2008, pp. 53–71 DOI: 10.1111/j.1467-9868.2007.00627.x
- Alvaro Mendez-Civieta, M.Carmen Aguilera-Morillo and Rosa E. Lillo “Adaptive sparse group LASSO in quantile regression” In Advances in Data Analysis and Classification 15.3, 2021, pp. 547–573 DOI: 10.1007/s11634-020-00413-8
- Seyedeh Zeinab Moghaddas, Masoumeh Tajafari and Mohsen Nowkarizi “Organizational empowerment: A vital step toward intrapreneurship” Publisher: SAGE Publications Ltd In Journal of Librarianship and Information Science 52.2, 2020, pp. 529–540 DOI: 10.1177/0961000619841658
- Fabian Obster, Heidi Bohle and Paul M. Pechan “The financial well-being of fruit farmers in Chile and Tunisia depends more on social and geographical factors than on climate change” Number: 1 Publisher: Nature Publishing Group In Communications Earth & Environment 5.1, 2024, pp. 1–12 DOI: 10.1038/s43247-023-01128-2
- “Using interpretable boosting algorithms for modeling environmental and agricultural data” Number: 1 Publisher: Nature Publishing Group In Scientific Reports 13.1, 2023, pp. 12767 DOI: 10.1038/s41598-023-39918-5
- Paul M. Pechan, Heidi Bohle and Fabian Obster “Reducing vulnerability of fruit orchards to climate change” In Agricultural Systems 210, 2023, pp. 103713 DOI: 10.1016/j.agsy.2023.103713
- Nina Schießl “Erstellung des Befragungsinstruments” In Intrapreneurship-Potenziale bei Mitarbeitern: Entwicklung, Optimierung und Validierung eines Diagnoseinstruments, Innovation und Entrepreneurship Wiesbaden: Springer Fachmedien, 2015, pp. 63–86 DOI: 10.1007/978-3-658-09428-7˙6
- “A Sparse-Group Lasso” In Journal of Computational and Graphical Statistics 22.2, 2013, pp. 231–245 DOI: 10.1080/10618600.2012.681250
- “Randomized boosting with multivariable base-learners for high-dimensional variable selection and prediction” In BMC Bioinformatics 22.1, 2021, pp. 441 DOI: 10.1186/s12859-021-04340-z
- D.Mikis Stasinopoulos and Robert A. Rigby “Generalized Additive Models for Location Scale and Shape (GAMLSS) in R” In Journal of Statistical Software 23, 2008, pp. 1–46 DOI: 10.18637/jss.v023.i07
- Fengzhen Tang, Lukáš Adam and Bailu Si “Group feature selection with multiclass support vector machine” In Neurocomputing 317, 2018, pp. 42–49 DOI: 10.1016/j.neucom.2018.07.012
- R Core Team “R: A language and environment for statistical computing” In R Foundation for Statistical Computing URL: https://www.R-project.org/
- Robert Tibshirani “Regression Shrinkage and Selection via the Lasso” In Journal of the Royal Statistical Society. Series B (Methodological) 58.1, 1996, pp. 267–288 URL: https://www.jstor.org/stable/2346178
- “Boosting ridge regression” In Computational Statistics & Data Analysis 51.12, 2007, pp. 6044–6059 DOI: 10.1016/j.csda.2006.11.041
- Hadley Wickham “ggplot2: Elegant Graphics for Data Analysis” Springer-Verlag New York, 2016 URL: https://ggplot2.tidyverse.org
- Jaehong Yoon and Sung Ju Hwang “Combined Group and Exclusive Sparsity for Deep Neural Networks” ISSN: 2640-3498 In Proceedings of the 34th International Conference on Machine Learning PMLR, 2017, pp. 3958–3966 URL: https://proceedings.mlr.press/v70/yoon17a.html
- “Model selection and estimation in regression with grouped variables” In Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68.1, 2006, pp. 49–67 DOI: 10.1111/j.1467-9868.2005.00532.x
- “An efficient Hessian based algorithm for solving large-scale sparse group Lasso problems” In Mathematical Programming 179.1, 2020, pp. 223–263 DOI: 10.1007/s10107-018-1329-6
- Lei Zhao, Qinghua Hu and Wenwu Wang “Heterogeneous Feature Selection With Multi-Modal Deep Neural Networks and Sparse Group LASSO” Conference Name: IEEE Transactions on Multimedia In IEEE Transactions on Multimedia 17.11, 2015, pp. 1936–1948 DOI: 10.1109/TMM.2015.2477058
- Yi Zuo, Thomas G. Stewart and Jeffrey D. Blume “Variable Selection With Second-Generation P -Values” In The American Statistician 76.2, 2022, pp. 91–101 DOI: 10.1080/00031305.2021.1946150