Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Sparse-group boosting -- Unbiased group and variable selection (2206.06344v2)

Published 13 Jun 2022 in stat.ME and stat.ML

Abstract: In the presence of grouped covariates, we propose a framework for boosting that allows to enforce sparsity within and between groups. By using component-wise and group-wise gradient boosting at the same time with adjusted degrees of freedom, a model with similar properties as the sparse group lasso can be fitted through boosting. We show that within-group and between-group sparsity can be controlled by a mixing parameter and discuss similarities and differences to the mixing parameter in the sparse group lasso. With simulations, gene data as well as agricultural data we show the effectiveness and predictive competitiveness of this estimator. The data and simulations suggest, that in the presence of grouped variables the use of sparse group boosting is associated with less biased variable selection and higher predictability compared to component-wise boosting. Additionally, we propose a way of reducing bias in component-wise boosting through the degrees of freedom.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. Naresh Kumar Agarwal “Verifying survey items for construct validity: A two-stage sorting procedure for questionnaire design in information behavior research” In Proceedings of the American Society for Information Science and Technology 48.1, 2011, pp. 1–8 DOI: 10.1002/meet.2011.14504801166
  2. Leo Breiman “Arcing classifier (with discussion and a rejoinder by the author)” test:Publisher In The Annals of Statistics 26.3, 1998, pp. 801–849 DOI: 10.1214/aos/1024691079
  3. “Boosting Algorithms: Regularization, Prediction and Model Fitting” In Statistical Science 22.4, 2007, pp. 477–505 DOI: 10.1214/07-STS242
  4. “Computational Issues” In Semiparametric Regression, Cambridge Series in Statistical and Probabilistic Mathematics Cambridge: Cambridge University Press, 2003, pp. 336–360 DOI: 10.1017/CBO9780511755453.023
  5. “Bayesian Sparse Group Selection” Publisher: [American Statistical Association, Taylor & Francis, Ltd., Institute of Mathematical Statistics, Interface Foundation of America] In Journal of Computational and Graphical Statistics 25.3, 2016, pp. 665–683 URL: https://www.jstor.org/stable/44861885
  6. Mohammad Ziaul Islam Chowdhury and Tanvir C Turin “Variable selection strategies and its importance in clinical prediction modelling” In Family Medicine and Community Health 8.1, 2020, pp. e000262 DOI: 10.1136/fmch-2019-000262
  7. Fatemeh Farokhmanesh and Mohammad Taghi Sadeghi “Deep Feature Selection using an Enhanced Sparse Group Lasso Algorithm” ISSN: 2642-9527 In 2019 27th Iranian Conference on Electrical Engineering (ICEE), 2019, pp. 1549–1552 DOI: 10.1109/IranianCEE.2019.8786386
  8. Jerome Friedman, Trevor Hastie and Robert Tibshirani “Additive logistic regression: a statistical view of boosting (With discussion and a rejoinder by the authors)” In The Annals of Statistics 28.2, 2000, pp. 337–407 DOI: 10.1214/aos/1016218223
  9. Jerome H. Friedman “Greedy function approximation: A gradient boosting machine.” In The Annals of Statistics 29.5, 2001, pp. 1189–1232 DOI: 10.1214/aos/1013203451
  10. ““My Questionnaire is Too Long!” The assessments of motivational-affective constructs with three-item and single-item measures” In Contemporary Educational Psychology 39.3, 2014, pp. 188–205 DOI: 10.1016/j.cedpsych.2014.04.002
  11. “Approaches to Regularized Regression – A Comparison between Gradient Boosting and the Lasso” In Methods of Information in Medicine 55.5, 2016, pp. 422–430 DOI: 10.3414/ME16-01-0033
  12. “A Framework for Unbiased Model Selection Based on Boosting” In Journal of Computational and Graphical Statistics 20.4, 2011, pp. 956–971 DOI: 10.1198/jcgs.2011.09220
  13. Benjamin Hofner, Andreas Mayr and Matthias Schmid “gamboostLSS: An R Package for Model Building and Variable Selection in the GAMLSS Framework” arXiv: 1407.1774 In arXiv:1407.1774 [stat], 2014 URL: http://arxiv.org/abs/1407.1774
  14. Yasutoshi Ida, Yasuhiro Fujiwara and Hisashi Kashima “Fast Sparse Group Lasso” In Advances in Neural Information Processing Systems 32 Curran Associates, Inc., 2019 URL: https://proceedings.neurips.cc/paper/2019/hash/d240e3d38a8882ecad8633c8f9c78c9b-Abstract.html
  15. Iain M. Johnstone and D.Michael Titterington “Statistical challenges of high-dimensional data” In Philosophical transactions. Series A, Mathematical, physical, and engineering sciences 367.1906, 2009, pp. 4237–4253 DOI: 10.1098/rsta.2009.0159
  16. Thomas Kneib, Torsten Hothorn and Gerhard Tutz “Variable Selection and Model Choice in Geoadditive Regression Models” Publisher: John Wiley & Sons, Ltd In Biometrics 65.2, 2009, pp. 626–634 URL: https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1541-0420.2008.01112.x
  17. Russell Matthews, Wendy Diaz and Steven Cole “The organizational empowerment scale” In Personnel Review 32, 2003, pp. 297–318 DOI: 10.1108/00483480310467624
  18. Lukas Meier, Sara Van De Geer and Peter Bühlmann “The group lasso for logistic regression” In Journal of the Royal Statistical Society: Series B (Statistical Methodology) 70.1, 2008, pp. 53–71 DOI: 10.1111/j.1467-9868.2007.00627.x
  19. Alvaro Mendez-Civieta, M.Carmen Aguilera-Morillo and Rosa E. Lillo “Adaptive sparse group LASSO in quantile regression” In Advances in Data Analysis and Classification 15.3, 2021, pp. 547–573 DOI: 10.1007/s11634-020-00413-8
  20. Seyedeh Zeinab Moghaddas, Masoumeh Tajafari and Mohsen Nowkarizi “Organizational empowerment: A vital step toward intrapreneurship” Publisher: SAGE Publications Ltd In Journal of Librarianship and Information Science 52.2, 2020, pp. 529–540 DOI: 10.1177/0961000619841658
  21. Fabian Obster, Heidi Bohle and Paul M. Pechan “The financial well-being of fruit farmers in Chile and Tunisia depends more on social and geographical factors than on climate change” Number: 1 Publisher: Nature Publishing Group In Communications Earth & Environment 5.1, 2024, pp. 1–12 DOI: 10.1038/s43247-023-01128-2
  22. “Using interpretable boosting algorithms for modeling environmental and agricultural data” Number: 1 Publisher: Nature Publishing Group In Scientific Reports 13.1, 2023, pp. 12767 DOI: 10.1038/s41598-023-39918-5
  23. Paul M. Pechan, Heidi Bohle and Fabian Obster “Reducing vulnerability of fruit orchards to climate change” In Agricultural Systems 210, 2023, pp. 103713 DOI: 10.1016/j.agsy.2023.103713
  24. Nina Schießl “Erstellung des Befragungsinstruments” In Intrapreneurship-Potenziale bei Mitarbeitern: Entwicklung, Optimierung und Validierung eines Diagnoseinstruments, Innovation und Entrepreneurship Wiesbaden: Springer Fachmedien, 2015, pp. 63–86 DOI: 10.1007/978-3-658-09428-7˙6
  25. “A Sparse-Group Lasso” In Journal of Computational and Graphical Statistics 22.2, 2013, pp. 231–245 DOI: 10.1080/10618600.2012.681250
  26. “Randomized boosting with multivariable base-learners for high-dimensional variable selection and prediction” In BMC Bioinformatics 22.1, 2021, pp. 441 DOI: 10.1186/s12859-021-04340-z
  27. D.Mikis Stasinopoulos and Robert A. Rigby “Generalized Additive Models for Location Scale and Shape (GAMLSS) in R” In Journal of Statistical Software 23, 2008, pp. 1–46 DOI: 10.18637/jss.v023.i07
  28. Fengzhen Tang, Lukáš Adam and Bailu Si “Group feature selection with multiclass support vector machine” In Neurocomputing 317, 2018, pp. 42–49 DOI: 10.1016/j.neucom.2018.07.012
  29. R Core Team “R: A language and environment for statistical computing” In R Foundation for Statistical Computing URL: https://www.R-project.org/
  30. Robert Tibshirani “Regression Shrinkage and Selection via the Lasso” In Journal of the Royal Statistical Society. Series B (Methodological) 58.1, 1996, pp. 267–288 URL: https://www.jstor.org/stable/2346178
  31. “Boosting ridge regression” In Computational Statistics & Data Analysis 51.12, 2007, pp. 6044–6059 DOI: 10.1016/j.csda.2006.11.041
  32. Hadley Wickham “ggplot2: Elegant Graphics for Data Analysis” Springer-Verlag New York, 2016 URL: https://ggplot2.tidyverse.org
  33. Jaehong Yoon and Sung Ju Hwang “Combined Group and Exclusive Sparsity for Deep Neural Networks” ISSN: 2640-3498 In Proceedings of the 34th International Conference on Machine Learning PMLR, 2017, pp. 3958–3966 URL: https://proceedings.mlr.press/v70/yoon17a.html
  34. “Model selection and estimation in regression with grouped variables” In Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68.1, 2006, pp. 49–67 DOI: 10.1111/j.1467-9868.2005.00532.x
  35. “An efficient Hessian based algorithm for solving large-scale sparse group Lasso problems” In Mathematical Programming 179.1, 2020, pp. 223–263 DOI: 10.1007/s10107-018-1329-6
  36. Lei Zhao, Qinghua Hu and Wenwu Wang “Heterogeneous Feature Selection With Multi-Modal Deep Neural Networks and Sparse Group LASSO” Conference Name: IEEE Transactions on Multimedia In IEEE Transactions on Multimedia 17.11, 2015, pp. 1936–1948 DOI: 10.1109/TMM.2015.2477058
  37. Yi Zuo, Thomas G. Stewart and Jeffrey D. Blume “Variable Selection With Second-Generation P -Values” In The American Statistician 76.2, 2022, pp. 91–101 DOI: 10.1080/00031305.2021.1946150
Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets