Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MAPTree: Beating "Optimal" Decision Trees with Bayesian Decision Trees (2309.15312v3)

Published 26 Sep 2023 in cs.LG and cs.AI

Abstract: Decision trees remain one of the most popular machine learning models today, largely due to their out-of-the-box performance and interpretability. In this work, we present a Bayesian approach to decision tree induction via maximum a posteriori inference of a posterior distribution over trees. We first demonstrate a connection between maximum a posteriori inference of decision trees and AND/OR search. Using this connection, we propose an AND/OR search algorithm, dubbed MAPTree, which is able to recover the maximum a posteriori tree. Lastly, we demonstrate the empirical performance of the maximum a posteriori tree both on synthetic data and in real world settings. On 16 real world datasets, MAPTree either outperforms baselines or demonstrates comparable performance but with much smaller trees. On a synthetic dataset, MAPTree also demonstrates greater robustness to noise and better generalization than existing approaches. Finally, MAPTree recovers the maxiumum a posteriori tree faster than existing sampling approaches and, in contrast with those algorithms, is able to provide a certificate of optimality. The code for our experiments is available at https://github.com/ThrunGroup/maptree.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. Learning Optimal Decision Trees Using Caching Branch-and-Bound Search. Proceedings of the AAAI Conference on Artificial Intelligence, 34(04): 3146–3153. Number: 04.
  2. Optimal classification trees. Machine Learning, 106.
  3. Breiman, L. 2001. Random Forests. Machine Learning, 45(1): 5–32.
  4. Classification and Regression Trees. 1 edition.
  5. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’16, 785–794. New York, NY, USA: Association for Computing Machinery. ISBN 978-1-4503-4232-2.
  6. Bayesian CART Model Search. Journal of the American Statistical Association, 93(443): 935–948. Publisher: Taylor & Francis.
  7. MurTree: optimal decision trees via Dynamic programming and search. The Journal of Machine Learning Research, 23(1): 26:1169–26:1215.
  8. A Bayesian CART Algorithm. Biometrika, 85(3): 363–377.
  9. The Botanical Beauty of Random Binary Trees. In International Symposium Graph Drawing and Network Visualization.
  10. The Taxicab Sampler: MCMC for Discrete Spaces with Application to Tree Models. Journal of Statistical Computation and Simulation, 1–22. Publisher: Taylor & Francis.
  11. Why do Tree-Based Models Still Outperform Deep Learning on Tabular Data?
  12. Itemset mining: A constraint programming perspective. Artificial Intelligence, 175(12): 1951–1983.
  13. Optimal Sparse Decision Trees. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc.
  14. Constructing optimal binary decision trees is NP-complete. Information Processing Letters, 5(1): 15–17.
  15. On Mixing Rates for Bayesian CART. ArXiv:2306.00126 [math, stat].
  16. Time Constrained DL8.5 Using Limited Discrepancy Search. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2022, Grenoble, France, September 19–23, 2022, Proceedings, Part V, 443–459. Berlin, Heidelberg: Springer-Verlag. ISBN 978-3-031-26418-4.
  17. Top-down particle filtering for bayesian decision trees. In Proceedings of the 30th international conference on international conference on machine learning - volume 28, ICML’13, III–280–III–288. JMLR.org. Place: Atlanta, GA, USA.
  18. Generalized and scalable optimal sparse decision trees. In Proceedings of the 37th International Conference on Machine Learning, volume 119 of ICML’20, 6150–6160. JMLR.org.
  19. Admissible heuristic search in and/or graphs. Theoretical Computer Science, 24(2): 207–219. Publisher: Elsevier.
  20. AND/OR graph heuristic search methods. Journal of the ACM, 32(1): 28–51.
  21. Nijssen, S. 2008. Bayes optimal classification for decision trees. In Proceedings of the 25th international conference on Machine learning, ICML ’08, 696–703. New York, NY, USA: Association for Computing Machinery. ISBN 978-1-60558-205-4.
  22. Mining optimal decision trees from itemset lattices. In Knowledge discovery and data mining.
  23. Pratola, M. T. 2016. Efficient Metropolis–Hastings Proposal Mechanisms for Bayesian Regression Tree Models. Bayesian Analysis, 11(3): 885–911. Publisher: International Society for Bayesian Analysis.
  24. Quinlan, J. R. 1986. Induction of Decision Trees. Machine Learning, 1(1): 81–106.
  25. Fair and optimal decision trees: A dynamic programming approach. In Koyejo, S.; Mohamed, S.; Agarwal, A.; Belgrave, D.; Cho, K.; and Oh, A., eds., Advances in neural information processing systems, volume 35, 38899–38911. Curran Associates, Inc.
  26. Compact-MDD: Efficiently Filtering (s)MDD Constraints with Reversible Sparse Bit-sets. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, 1383–1389. International Joint Conferences on Artificial Intelligence Organization.
  27. Learning optimal decision trees using constraint programming. Constraints, 25(3): 226–250.
  28. Learning optimal classification trees using a binary linear program formulation. In Proceedings of the thirty-third AAAI conference on artificial intelligence and thirty-first innovative applications of artificial intelligence conference and ninth AAAI symposium on educational advances in artificial intelligence, AAAI’19/IAAI’19/EAAI’19. AAAI Press. ISBN 978-1-57735-809-1. Place: Honolulu, Hawaii, USA Number of pages: 8 tex.articleno: 200.
Citations (1)

Summary

We haven't generated a summary for this paper yet.