BARTSIMP: flexible spatial covariate modeling and prediction using Bayesian additive regression trees
Abstract: Prediction is a classic challenge in spatial statistics and the inclusion of spatial covariates can greatly improve predictive performance when incorporated into a model with latent spatial effects. It is desirable to develop flexible regression models that allow for nonlinearities and interactions in the covariate specification. Existing machine learning approaches that allow for spatial dependence in the residuals fail to provide reliable uncertainty estimates. In this paper, we investigate the combination of a Gaussian process spatial model with a Bayesian Additive Regression Tree (BART) model. The computational burden of the approach is reduced by combining Markov chain Monte Carlo (MCMC) with the Integrated Nested Laplace Approximation (INLA) technique. We study the performance of the method first via simulation. We then use the model to predict anthropometric responses in Kenya, with the data collected via a complex sampling design. In particular, household survey data are collected via stratified two-stage unequal probability cluster sampling, which requires special care when modeled.
- An error-components model for prediction of county crop areas using survey and satellite data. Journal of the American Statistical Association, 83(401):28–36.
- Breiman, L. (2001). Random forests. Machine Learning, 45(1):5–32.
- Mapping 123 million neonatal, infant and child deaths between 2000 and 2017. Nature, 574(7778):353–358.
- Bayesian CART model search. Journal of the American Statistical Association, 93(443):935–948.
- BART: Bayesian additive regression trees. The Annals of Applied Statistics, 4(1):266–298.
- Optimal spatial prediction using ensemble machine learning. The International Journal of Biostatistics, 12(1):179–201.
- REDS: Random ensemble deep spatial prediction. Environmetrics, 34(1):e2780.
- High-dimensional inference: confidence intervals, p-values and r-software hdi. Statistical Science, pages 533–558.
- Model-Based Geostatistics for Global Public Health: Methods and Applications. CRC Press.
- Eddelbuettel, D. (2013). Seamless R and C++ Integration with Rcpp. Springer.
- Extending R with C++: a brief introduction to Rcpp. The American Statistician, 72(1):28–36.
- Rcpp: Seamless R and C++ Integration. Journal of Statistical Software, 40:1–18.
- Estimates of income for small places: an application of James-Stein procedures to census data. Journal of the American Statistical Association, 74(366a):269–277.
- Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of Statistics, pages 1189–1232.
- Constructing priors that penalize the complexity of Gaussian random fields. Journal of the American Statistical Association, 114(525):445–452.
- Geographical random forests: a spatial extension of the random forest algorithm to address spatial heterogeneity in remote sensing and population modelling. Geocarto International, 36(2):121–136.
- Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102(477):359–378.
- Peeking inside the black box: Visualizing statistical learning with plots of individual conditional expectation. Journal of Computational and Graphical Statistics, 24(1):44–65.
- Markov chain monte carlo with the integrated nested laplace approximation. Statistics and Computing, 28(5):1033–1051.
- Estimating the marginal likelihood with Integrated nested Laplace approximation (INLA). arXiv preprint arXiv:1611.01450.
- Exploring the association of anthropometric indicators for under-five children in Ethiopia. BMC Public Health, 19(1):1–6.
- On block updating in Markov random field models for disease mapping. Scandinavian Journal of Statistics, 29(4):597–614.
- A new spatial count data model with Bayesian additive regression trees for accident hot spot identification. Accident Analysis and Prevention, 144:105623.
- A matrix exponential spatial specification. Journal of Econometrics, 140(1):190–214. Analysis of spatially dependent data.
- An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(4):423–498.
- A flexible spatio-temporal model for air pollution with spatial and spatio-temporal covariates. Environmental and Ecological Statistics, 21:411–433.
- Sub national variation and inequalities in under-five mortality in Kenya since 1965. BMC Public Health, 19(1):1–12.
- Bayesian computing with inla: new features. Computational Statistics and Data Analysis, 67:68–83.
- Standard deviation of anthropometric Z-scores as a data quality assessment tool using the 2006 WHO growth standards: a cross country analysis. Bulletin of the World Health Organization, 85:441–448.
- A spatially-adjusted Bayesian additive regression tree model to merge two datasets. Bayesian Analysis, 2(3):611–633.
- Mapping child growth failure in Africa between 2000 and 2015. Nature, 555(7694):41–47.
- Bayesian inference for logistic models using pólya–gamma latent variables. Journal of the American statistical Association, 108(504):1339–1349.
- Small Area Estimation. John Wiley & Sons.
- Maternal exposure to ambient PM10 during pregnancy increases the risk of congenital heart defects: Evidence from machine learning models. Science of The Total Environment, 630:1–10.
- Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(2):319–392.
- Digital mapping of zinc in urban topsoil using multisource geospatial data and random forest. Science of the Total Environment, 792:148455.
- Nonparametric machine learning for precision medicine with longitudinal clinical trials and Bayesian additive regression trees with mixed models. Statistics in Medicine, 40(11):2665–2691.
- High resolution age-structured mapping of childhood vaccination coverage in low and middle income countries. Vaccine, 36(12):1583–1591.
- Bayesian geostatistical modelling of stunting in Rwanda: risk factors and spatially explicit residual stunting burden. BMC Public Health, 22(1):1–14.
- Improving the spatial prediction of soil organic carbon using environmental covariates selection: A comparison of a group of environmental covariates. CATENA, 208:105723.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.