Weighted-Average Least Squares for Negative Binomial Regression
Abstract: Model averaging methods have become an increasingly popular tool for improving predictions and dealing with model uncertainty, especially in Bayesian settings. Recently, frequentist model averaging methods such as information theoretic and least squares model averaging have emerged. This work focuses on the issue of covariate uncertainty where managing the computational resources is key: The model space grows exponentially with the number of covariates such that averaged models must often be approximated. Weighted-average least squares (WALS), first introduced for (generalized) linear models in the econometric literature, combines Bayesian and frequentist aspects and additionally employs a semiorthogonal transformation of the regressors to reduce the computational burden. This paper extends WALS for generalized linear models to the negative binomial (NB) regression model for overdispersed count data. A simulation experiment and an empirical application using data on doctor visits were conducted to compare the predictive power of WALS for NB regression to traditional estimators. The results show that WALS for NB improves on the maximum likelihood estimator in sparse situations and is competitive with lasso while being computationally more efficient.
- Choosing among regularized estimators in empirical economics: The risk of machine learning. Review of Economics and Statistics 101, 743–762. doi:10.1162/rest_a_00812.
- A model-averaging approach for high-dimensional regression. Journal of the American Statistical Association 109, 254–265. doi:10.1080/01621459.2013.838168.
- ggthemes: Extra Themes, Scales and Geoms for ‘ggplot2’. URL: https://CRAN.R-project.org/package=ggthemes. R package version 4.2.4.
- Rdpack: Update and manipulate Rd documentation objects. doi:10.5281/zenodo.3925612. R package version 2.5.
- Econometric models based on count data. Comparisons and applications of some estimators and tests. Journal of Applied Econometrics 1, 29–53. doi:10.1002/jae.3950010104.
- A microeconometric model of the demand for health care and health insurance in Australia. The Review of Economic Studies 55, 85–106. doi:10.2307/2297531.
- Maximum likelihood estimation for dependent observations. Journal of the Royal Statistical Society: Series B (Methodological) 38, 45–53. doi:10.1111/j.2517-6161.1976.tb01565.x.
- Predictive model assessment for count data. Biometrics 65, 1254–1261. doi:10.1111/j.1541-0420.2009.01191.x.
- xtable: Export Tables to LaTeX or HTML. URL: https://CRAN.R-project.org/package=xtable. R package version 1.8-4.
- Bayesian model averaging and weighted-average least squares: Equivariance, stability, and numerical issues. The Stata Journal 11, 518–544. doi:10.1177/1536867X1201100402.
- Weighted-average least squares estimation of generalized linear models. Journal of Econometrics 204, 1–17. doi:10.1016/j.jeconom.2017.12.007.
- Sampling properties of the Bayesian posterior mean with an application to WALS estimation. Journal of Econometrics 230, 299–317. doi:10.1016/j.jeconom.2021.04.008.
- Weighted-average least squares (WALS): Confidence and prediction intervals. Computational Economics 61. doi:10.1007/s10614-022-10255-5.
- The structure of demand for health care: latent class versus two-part models. Journal of Health Economics 21, 601–625. doi:10.1016/S0167-6296(02)00008-5.
- Quantum machine learning with response operators in chemical compound space, in: Schütt, K.T., Chmiela, S., von Lilienfeld, O.A., Tkatchenko, A., Tsuda, K., Müller, K.R. (Eds.), Machine Learning Meets Quantum Physics. Springer-Verlag, Cham. chapter 8, pp. 155–169. doi:10.1007/978-3-030-40245-7_8.
- Regression: Models, Methods and Applications. Springer-Verlag, Berlin, Heidelberg. doi:10.1007/978-3-642-34333-9.
- Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association 102, 359–378. doi:10.1198/016214506000001437.
- Functional forms for the negative binomial model for count data. Economics Letters 99, 585–590. doi:10.1016/j.econlet.2007.10.015.
- An efficient model averaging procedure for logistic regression models using a Bayesian estimator with Laplace prior, in: Kneib, T., Tutz, G. (Eds.), Statistical Modelling and Regression Structures: Festschrift in Honour of Ludwig Fahrmeir. Physica-Verlag, Heidelberg, pp. 79–90. doi:10.1007/978-3-7908-2413-1_5.
- Frequentist model average estimators. Journal of the American Statistical Association 98, 879–899. doi:10.1198/016214503000000828.
- stargazer: Well-Formatted Regression and Summary Statistics Tables. Social Policy Institute. Bratislava, Slovakia. URL: https://CRAN.R-project.org/package=stargazer. R package version 5.2.3.
- Bayesian model averaging: A tutorial (with discussion). Statistical Science 14, 382–417. doi:10.1214/ss/1009212519. corrected version available at http://www.stat.washington.edu/www/research/online/hoeting1999.pdf.
- Model-based boosting in R: A hands-on tutorial using the R package mboost. Computational Statistics 29, 3–35. doi:10.1007/s00180-012-0382-5.
- The design and analysis of benchmark experiments. Journal of Computational and Graphical Statistics 14, 675–699. doi:10.1198/106186005X59630.
- WALS: Weighted-average least squares model averaging in R. University of Basel. Mimeo.
- Applied Econometrics with R. Springer-Verlag, New York. URL: https://CRAN.R-project.org/package=AER, doi:10.1007/978-0-387-77318-6.
- Evaluating predictive count data distributions in retail sales forecasting. International Journal of Forecasting 32, 788–803. doi:10.1016/j.ijforecast.2015.12.004.
- Negative binomial and mixed Poisson regression. The Canadian Journal of Statistics / La Revue Canadienne de Statistique 15, 209–225.
- Model selection and accounting for model uncertainty in graphical models using Occam’s window. Journal of the American Statistical Association 89, 1535–1546. doi:10.1080/01621459.1994.10476894.
- Weighted-average least squares (WALS): A survey. Journal of Economic Surveys 30, 117–148. doi:10.1111/joes.12094.
- A comparison of two model averaging techniques with an application to growth empirics. Journal of Econometrics 154, 139–153. doi:10.1016/j.jeconom.2009.07.004.
- The learning-curve sampling method applied to model-based clustering. Journal of Machine Learning Research 2, 397–418.
- latex2exp: Use LaTeX Expressions in Plots. URL: https://CRAN.R-project.org/package=latex2exp. R package version 0.9.6.
- Bayesian and non-Bayesian methods for combining models and forecasts with applications to forecasting international growth rates. Journal of Econometrics 56, 89–118. doi:10.1016/0304-4076(93)90102-B.
- Heterogeneity, excess zeros, and the structure of count data models. Journal of Applied Econometrics 12, 337–350. doi:10.1002/(SICI)1099-1255(199705)12:3<337::AID-JAE438>3.0.CO;2-G.
- Large sample estimation and hypothesis testing, in: Engle, R.F., McFadden, D.L. (Eds.), Handbook of Econometrics. North-Holland, Amsterdam. volume 4. chapter 36, pp. 2111–2245.
- abind: Combine Multidimensional Arrays. URL: https://CRAN.R-project.org/package=abind. R package version 1.4-5.
- BMA: Bayesian Model Averaging. URL: https://CRAN.R-project.org/package=BMA. R package version 3.18.12.
- R Core Team, 2023. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria. URL: https://www.R-project.org/.
- Fast and accurate modeling of molecular atomization energies with machine learning. Physical Review Letters 108, 058301. doi:10.1103/PhysRevLett.108.058301.
- Model averaging and its use in economics. Journal of Economic Literature 58, 644–719. doi:10.1257/jel.20191385.
- Modern Applied Statistics with S. Statistics and Computing. 4th ed., Springer-Verlag, New York. URL: https://www.stats.ox.ac.uk/pub/MASS4/, doi:10.1007/978-0-387-21706-2.
- mpath: Regularized Linear Models. URL: https://CRAN.R-project.org/package=mpath. R package version 0.4-2.23.
- Penalized count data regression with application to hospital stay after pediatric cardiac surgery. Statistical Methods in Medical Research 25, 2685–2703. doi:10.1177/0962280214530608.
- ggplot2: Elegant Graphics for Data Analysis. 2nd ed., Springer-Verlag, New York. URL: https://ggplot2.tidyverse.org, doi:10.1007/978-3-319-24277-4.
- Scoring rules and the evaluation of probabilities. Test 5, 1–60. doi:10.1007/BF02562681.
- Extended model formulas in R: Multiple parts and multiple responses. Journal of Statistical Software 34, 1–13. doi:10.18637/jss.v034.i01.
- Inference after model averaging in linear regression models. Econometric Theory 35, 816–841. doi:10.1017/S0266466618000269.
- Optimal model averaging estimation for generalized linear models and generalized linear mixed-effects models. Journal of the American Statistical Association 111, 1775–1790. doi:10.1080/01621459.2015.1115762.
- Matrix Analysis and Applications. Cambridge University Press, Cambridge. doi:10.1017/9781108277587.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.