Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
120 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Prediction Algorithms Achieving Bayesian Decision Theoretical Optimality Based on Decision Trees as Data Observation Processes (2306.07060v1)

Published 12 Jun 2023 in cs.LG and stat.ML

Abstract: In the field of decision trees, most previous studies have difficulty ensuring the statistical optimality of a prediction of new data and suffer from overfitting because trees are usually used only to represent prediction functions to be constructed from given data. In contrast, some studies, including this paper, used the trees to represent stochastic data observation processes behind given data. Moreover, they derived the statistically optimal prediction, which is robust against overfitting, based on the Bayesian decision theory by assuming a prior distribution for the trees. However, these studies still have a problem in computing this Bayes optimal prediction because it involves an infeasible summation for all division patterns of a feature space, which is represented by the trees and some parameters. In particular, an open problem is a summation with respect to combinations of division axes, i.e., the assignment of features to inner nodes of the tree. We solve this by a Markov chain Monte Carlo method, whose step size is adaptively tuned according to a posterior distribution for the trees.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)
  1. Classification and Regression Trees. CRC press, 1984.
  2. Leo Breiman. Random forests. Machine learning, 45(1):5–32, 2001.
  3. Jerome H. Friedman. Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5):1189 – 1232, 2001.
  4. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, pages 785–794, New York, NY, USA, 2016. Association for Computing Machinery.
  5. James O. Berger. Statistical Decision Theory and Bayesian Analysis. Springer New York, New York, NY, 1985.
  6. Prediction algorithm for decision tree model. IEICE technical report. Theoretical foundations of Computing, 103:93–98, 2003. (in Japanese).
  7. Meta-tree random forest: Probabilistic data-generative model and Bayes optimal prediction. Entropy, 23(6), 2021.
  8. Christopher Bishop. Pattern Recognition and Machine Learning. Springer, January 2006.
  9. Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, 30:3146–3154, 2017.
  10. Bayesian cart model search. Journal of the American Statistical Association, 93(443):935–948, 1998.
  11. BART: Bayesian additive regression trees. The Annals of Applied Statistics, 4(1):266 – 298, 2010.
  12. Probability distribution on full rooted trees. Entropy, 24(3), 2022.
  13. T. Matsushima and S. Hirasawa. A Bayes coding algorithm using context tree. In 1994 IEEE International Symposium on Information Theory, page 386, 1994.
  14. Replica monte carlo simulation of spin-glasses. Phys. Rev. Lett., 57:2607–2609, Nov 1986.
  15. Batch updating of a posterior tree distribution over a meta-tree. arXiv preprint arXiv:2303.09705, 2023.
  16. BayesML 0.2.3. https://github.com/yuta-nakahara/BayesML, 2022.
  17. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
  18. Nonparametric machine learning and efficient computation with Bayesian additive regression trees: The BART R package. Journal of Statistical Software, 97(1):1–66, 2021.
  19. Thomas Cason. Titanic data (titanic3). Vanderbilt Biostatistics Datasets, 1999. Accessed: 2023-5-16.
  20. UCI machine learning repository, 2017.

Summary

We haven't generated a summary for this paper yet.