Bayesian Quantile Regression with Subset Selection: A Decision Analysis Perspective (2311.02043v4)
Abstract: Quantile regression is a powerful tool for inferring how covariates affect specific percentiles of the response distribution. Existing methods either estimate conditional quantiles separately for each quantile of interest or estimate the entire conditional distribution using semi- or non-parametric models. The former often produce inadequate models for real data and do not share information across quantiles, while the latter are characterized by complex and constrained models that can be difficult to interpret and computationally inefficient. Neither approach is well-suited for quantile-specific subset selection. Instead, we pose the fundamental problems of linear quantile estimation, uncertainty quantification, and subset selection from a Bayesian decision analysis perspective. For any Bayesian regression model -- including, but not limited to existing Bayesian quantile regression models -- we derive optimal point estimates, interpretable uncertainty quantification, and scalable subset selection techniques for all model-based conditional quantiles. Our approach introduces a quantile-focused squared error loss that enables efficient, closed-form computing and maintains a close relationship with Wasserstein-based density estimation. In an extensive simulation study, our methods demonstrate substantial gains in quantile estimation accuracy, inference, and variable selection over frequentist and Bayesian competitors. We use these tools to identify and quantify the heterogeneous impacts of multiple social stressors and environmental exposures on educational outcomes across the full spectrum of low-, medium-, and high-achieving students in North Carolina.
- “Bayesian adaptive Lasso quantile regression.” Statistical Modelling, 12(3): 279–297.
- “Portfolio style: Return-based attribution using quantile regression.” Economic applications of quantile regression, 293–305.
- “L1-penalized quantile regression in high-dimensional sparse models.” The Annals of Statistics, 39(1): 82 – 130. URL https://doi.org/10.1214/10-AOS827
- “bayesQR: A Bayesian approach to quantile regression.” Journal of Statistical Software, 76: 1–32.
- “Best subset selection via a modern optimization lens.” The Annals of Statistics, 44(2): 813 – 852. URL https://doi.org/10.1214/15-AOS1388
- “Noncrossing quantile regression curve estimation.” Biometrika, 97(4): 825–838.
- “Racial residential segregation shapes the relationship between early childhood lead exposure and fourth-grade standardized test scores.” Proceedings of the National Academy of Sciences, 119(34): e2117868119.
- “Stan: A probabilistic programming language.” Journal of statistical software, 76.
- CEHI (2020). Linked Births, Lead Surveillance, grade 4 End-Of-Grade (EoG) Scores [Data Set]. URL https://www.cehidatahub.org/hub/Cohort_2000
- “Bayesian variable selection in quantile regression.” Statistics and its Interface, 6(2): 261–274.
- “Approximate Bayesian inference for quantiles.” Journal of Nonparametric Statistics, 17(3): 385–400.
- “Characterizing spatiotemporal trends in extreme precipitation in Southeast Texas.” Natural Hazards, 104: 1597–1621.
- “Fast calibrated additive quantile regression.” Journal of the American Statistical Association, 116(535): 1402–1412.
- “Supplement to “Bayesian Quantile Regression with Subset Selection: A Posterior Summarization Perspective”.”
- Fréchet, M. (1948). “Les éléments aléatoires de nature quelconque dans un espace distancié.” In Annales de l’institut Henri Poincaré, volume 10, 215–310.
- “Regressions by leaps and bounds.” Technometrics, 42(1): 69–79.
- “Posterior predictive assessment of model fitness via realized discrepancies.” Statistica sinica, 733–760.
- “Decoupling Shrinkage and Selection in Bayesian Linear Models: A Posterior Summary Perspective.” Journal of the American Statistical Association, 110(509): 435–448. URL https://doi.org/10.1080/01621459.2014.993077
- “Efficient algorithms for computing the best subset regression models for large-scale problems.” Computational Statistics & Data Analysis, 52(1): 16–29.
- “Simultaneous Linear Quantile Regression: A Semiparametric Bayesian Approach.” Bayesian Analysis, 7(1): 51 – 72. URL https://doi.org/10.1214/12-BA702
- “Regression quantiles.” Econometrica: journal of the Econometric Society, 33–50.
- “Handbook of quantile regression.”
- “Bayesian semiparametric median regression modeling.” Journal of the American Statistical Association, 96(456): 1458–1468.
- “Bayesian semiparametric modelling in quantile regression.” Scandinavian Journal of Statistics, 36(2): 297–319.
- Kowal, D. R. (2021). “Fast, Optimal, and Targeted Predictions using Parametrized Decision Analysis.” Journal of the American Statistical Association, 1–28.
- — (2022). “Bayesian subset selection and variable importance for interpretable prediction and classification.” Journal of Machine Learning Research, 23(108): 1–38. URL http://jmlr.org/papers/v23/21-0403.html
- “Bayesian variable selection for understanding mixtures in environmental exposures.” Statistics in medicine, 40(22): 4850–4871.
- “Monte Carlo inference for semiparametric Bayesian regression.” arXiv preprint arXiv:2306.05498.
- “Gibbs sampling methods for Bayesian quantile regression.” Journal of statistical computation and simulation, 81(11): 1565–1578.
- “Model selection via Bayesian information criterion for quantile regression models.” Journal of the American Statistical Association, 109(505): 216–229.
- “L 1-norm quantile regression.” Journal of Computational and Graphical Statistics, 17(1): 163–185.
- “The relationship between early childhood blood lead levels and performance on end-of-grade tests.” Environmental health perspectives, 115(8): 1242–1247.
- “A comparative study of regression based methods in regional flood frequency analysis.” Journal of Hydrology, 225(1-2): 92–101.
- “Wasserstein F𝐹Fitalic_F-tests and confidence bands for the Fréchet regression of density response curves.” The Annals of statistics, 49(1): 590–.
- “Heteroscedastic BART via multiplicative regression trees.” Journal of Computational and Graphical Statistics, 29(2): 405–417.
- “Bayesian model averaging for linear regression models.” Journal of the American Statistical Association, 92(437): 179–191.
- “Flexible Bayesian quantile regression for independent and clustered data.” Biostatistics, 11(2): 337–352. URL https://doi.org/10.1093/biostatistics/kxp049
- “Bayesian quantile regression for censored data.” Biometrics, 69(3): 651–660.
- “rqPen: Penalized quantile regression.” R package version, 2.
- “Posterior Consistency of Bayesian Quantile Regression Based on the Misspecified Asymmetric Laplace Density.” Bayesian Analysis, 8(2): 479 – 504. URL https://doi.org/10.1214/13-BA817
- “A Bayesian nonparametric approach to inference for quantile regression.” Journal of Business & Economic Statistics, 28(3): 357–369.
- “Quantile regression for analyzing heterogeneity in ultra-high dimension.” Journal of the American Statistical Association, 107(497): 214–222.
- “Model Interpretation Through Lower-Dimensional Posterior Summarization.” Journal of Computational and Graphical Statistics, 30(1): 144–161. URL https://doi.org/10.1080/10618600.2020.1796684
- “Variable selection in quantile regression.” Statistica Sinica, 801–817.
- “Bayesian empirical likelihood for quantile regression.” The Annals of Statistics, 40(2): 1102 – 1131. URL https://doi.org/10.1214/12-AOS1005
- “Using Stacking to Average Bayesian Predictive Distributions (with Discussion).” Bayesian Analysis, 13(3): 917 – 1007. URL https://doi.org/10.1214/17-BA1091
- “Bayesian quantile regression.” Statistics & Probability Letters, 54(4): 437–447.