Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

flexBART: Flexible Bayesian regression trees with categorical predictors (2211.04459v3)

Published 8 Nov 2022 in stat.ME and stat.ML

Abstract: Most implementations of Bayesian additive regression trees (BART) one-hot encode categorical predictors, replacing each one with several binary indicators, one for every level or category. Regression trees built with these indicators partition the discrete set of categorical levels by repeatedly removing one level at a time. Unfortunately, the vast majority of partitions cannot be built with this strategy, severely limiting BART's ability to partially pool data across groups of levels. Motivated by analyses of baseball data and neighborhood-level crime dynamics, we overcame this limitation by re-implementing BART with regression trees that can assign multiple levels to both branches of a decision tree node. To model spatial data aggregated into small regions, we further proposed a new decision rule prior that creates spatially contiguous regions by deleting a random edge from a random spanning tree of a suitably defined network. Our re-implementation, which is available in the flexBART package, often yields improved out-of-sample predictive performance and scales better to larger datasets than existing implementations of BART.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)
  1. Crime in Philadelphia: Bayesian clustering with particle optimization. Journal of the American Statistical Association.
  2. Clustering areal units at multiple levels of resolution to model crime incidence in Philadelphia. arXiv:2112.02059.
  3. Spatial modeling of trends in crime over time in Philadelphia. Annals of Applied Statistics, 13(4):2235–2259.
  4. Convergence rates of oblique regression trees for flexible function libraries. arXiv preprint 2210.14429v1.
  5. Bayesian CART model search. Journal of the American Statistical Association, 93(443):935–948.
  6. BART: Bayesian additive regression trees. Annals of Applied Statistics, 4(1):266–298.
  7. A hierarchical Bayesian model of pitch framing. Journal of Quantitative Analysis in Sports, 13(3):95–112.
  8. Rcpp: Seamless R and C++ integration. Journal of Statistical Software, 40(8):1–18.
  9. On mixing rates for Bayesian CART. arXiv preprint 2306.00126.
  10. Spatial homogeneity pursuit of regression coefficients for large datasets. Journal of the American Statistical Association, 114(527):1050–1062.
  11. Bast: Bayesian additive regression spanning trees for complex constrained domain. In Advances in Neural Information Processing Systems.
  12. A Bayesian contiguous partitioning method for learning clustered latent variables. Journal of Machine Learning Research, 22(37):1–52.
  13. BAMDT: Bayesian additive semi-multivariate decision trees for nonparametric regression. In Proceedings of the 39th International Conference on Machine Learning.
  14. baseballr: Acquiring and analyzing baseball data.
  15. Pratola, M. T. (2016). Efficient Metropolis-Hastings proposal mechanisms for Bayesian regression tree models. Bayesian Analysis, 11(3):885–911.
  16. Parallel Bayesian additive regression trees. Journal of Computational and Graphical Statistics, 23(3):830–852.
  17. A mixing time lower bound for a simplified version of BART. arXiv preprint 2210.09352v1.
  18. A generative spatial clustering model for random data through spanning trees. In 2015 IEEE International Conference on Data Mining.
  19. Bayesian space-time partitioning by sampling and pruning spanning trees. Journal of Machine Learning Research, 20(85):1–35.
  20. Wilson, D. B. (1996). Generating random spanning trees more quickly than the cover time. In Proceedings of the twenty-eighth annual ACM symposium on Theory of Computing.
Citations (4)

Summary

We haven't generated a summary for this paper yet.