A Balanced Statistical Boosting Approach for GAMLSS via New Step Lengths
Abstract: Component-wise gradient boosting algorithms are popular for their intrinsic variable selection and implicit regularization, which can be especially beneficial for very flexible model classes. When estimating generalized additive models for location, scale and shape (GAMLSS) by means of a component-wise gradient boosting algorithm, an important part of the estimation procedure is to determine the relative complexity of the submodels corresponding to the different distribution parameters. Existing methods either suffer from a computationally expensive tuning procedure or can be biased by structural differences in the negative gradients' sizes, which, if encountered, lead to imbalances between the different submodels. Shrunk optimal step lengths have been suggested to replace the typical small fixed step lengths for a non-cyclical boosting algorithm limited to a Gaussian response variable in order to address this issue. In this article, we propose a new adaptive step length approach that accounts for the relative size of the fitted base-learners to ensure a natural balance between the different submodels. The new balanced boosting approach thus represents a computationally efficient and easily generalizable alternative to shrunk optimal step lengths. We implemented the balanced non-cyclical boosting algorithm for a Gaussian, a negative binomial as well as a Weibull distributed response variable and demonstrate the competitive performance of the new adaptive step length approach by means of a simulation study, in the analysis of count data modeling the number of doctor's visits as well as for survival data in an oncological trial.
- Brier, G. W. (1950). Verification of forecasts expressed in terms of probability. Monthly Weather Review, 78(1):1–3.
- Bühlmann, P. (2006). Boosting for high-dimensional linear models. The Annals of Statistics, 34(2):1001–1024.
- Boosting algorithms: Regularization, prediction and model fitting. Statistical Science, 22(4):477–505.
- Boosting with the L2 loss. Journal of the American Statistical Association, 98(462):324–339.
- A microeconometric model of the demand for health care and health insurance in Australia. The Review of economic studies, 55(1):85–106.
- Birth weight and longitudinal growth in infants born below 32 weeks’ gestation: a UK population study. Archives of Disease in Childhood - Fetal and Neonatal Edition, 99(1):F34–F40.
- Improved outcome prediction across data sources through robust parameter tuning. Journal of Classification, 38(2):212–231.
- Bayesian Smoothing and Regression for Longitudinal, Spatial and Event History Data. Oxford University Press.
- Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5):1189–1232.
- On the interplay of regional mobility, social connectedness and the spread of COVID-19 in Germany. Journal of the Royal Statistical Society Series A: Statistics in Society, 185(1):400–424.
- Assessment and comparison of prognostic classification schemes for survival data. Statistics in Medicine, 18(17–18):2529–2545.
- Boosting distributional copula regression. Biometrics, 79(3):2298–2310.
- Generalized Additive Models. Chapman & Hall, London.
- Approaches to regularized regression – a comparison between gradient boosting and the lasso. Methods of Information in Medicine, 55(05):422–430.
- Hilbe, J. M. (2011). Negative binomial regression. Cambridge University Press, 2nd edition.
- gamboostLSS: An R package for model building and variable selection in the GAMLSS framework. Journal of Statistical Software, 74(1):1–31.
- Do temperature and humidity affect the transmission of SARS-CoV-2? - A flexible regression analysis. Annals of Data Science, 9(1):153–173.
- Bivariate poisson and diagonal inflated bivariate poisson regression models in R. Journal of Statistical Software, 14(10):1–36.
- Klein, N. (2024). Distributional regression for data analysis. Annual Review of Statistics and Its Application, 11(1).
- Bayesian structured additive distributional regression with an application to regional income inequality in Germany. The Annals of Applied Statistics, 9(2):1024–1052.
- Kneib, T. (2013). Beyond mean regression. Statistical Modelling, 13(4):275–303.
- Non-stationary flood frequency analysis in continental Spanish rivers, using climate and reservoir indices as external covariates. Hydrology and Earth System Sciences, 17(8):3189–3203.
- The evolution of boosting algorithms. Methods of Information in Medicine, 53(06):419–427.
- Generalized additive models for location, scale and shape for high dimensional data - a flexible approach based on boosting. Journal of the Royal Statistical Society: Series C (Applied Statistics), 61(3):403–427.
- Generalized additive models for location, scale and shape (with discussion). Journal of the Royal Statistical Society: Series C (Applied Statistics), 54(3):507–554.
- The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. New England Journal of Medicine, 346(25):1937–1947.
- gamlss.dist: Distributions for Generalized Additive Models for Location Scale and Shape. R package version 6.0-5.
- Boosting multivariate structured additive distributional regression models. Statistics in Medicine, 42(11):1779–1801.
- Deselection of base-learners for statistical boosting - with an application to distributional regression. Statistical Methods in Medical Research, 31(2):207–224.
- Gradient boosting for distributional regression: faster tuning and improved variable selection via noncyclical updates. Statistics and Computing, 28(3):673–687.
- BAMLSS: Bayesian additive models for location, scale, and shape (and beyond). Journal of Computational and Graphical Statistics, 27(3):612–627.
- International standards for newborn weight, length, and head circumference by gestational age and sex: the newborn cross-sectional study of the INTERGROWTH-21st project. The Lancet, 384(9946):857–868.
- Analyses of extreme flooding in Austria over the period 1951-2006. International Journal of Climatology, 32(8):1178–1192.
- Adaptive step-length selection in gradient boosting for Gaussian location and scale models. Computational Statistics, 37(5):2295–2332.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.