MAPTree: Beating "Optimal" Decision Trees with Bayesian Decision Trees (2309.15312v3)
Abstract: Decision trees remain one of the most popular machine learning models today, largely due to their out-of-the-box performance and interpretability. In this work, we present a Bayesian approach to decision tree induction via maximum a posteriori inference of a posterior distribution over trees. We first demonstrate a connection between maximum a posteriori inference of decision trees and AND/OR search. Using this connection, we propose an AND/OR search algorithm, dubbed MAPTree, which is able to recover the maximum a posteriori tree. Lastly, we demonstrate the empirical performance of the maximum a posteriori tree both on synthetic data and in real world settings. On 16 real world datasets, MAPTree either outperforms baselines or demonstrates comparable performance but with much smaller trees. On a synthetic dataset, MAPTree also demonstrates greater robustness to noise and better generalization than existing approaches. Finally, MAPTree recovers the maxiumum a posteriori tree faster than existing sampling approaches and, in contrast with those algorithms, is able to provide a certificate of optimality. The code for our experiments is available at https://github.com/ThrunGroup/maptree.
- Learning Optimal Decision Trees Using Caching Branch-and-Bound Search. Proceedings of the AAAI Conference on Artificial Intelligence, 34(04): 3146–3153. Number: 04.
- Optimal classification trees. Machine Learning, 106.
- Breiman, L. 2001. Random Forests. Machine Learning, 45(1): 5–32.
- Classification and Regression Trees. 1 edition.
- XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’16, 785–794. New York, NY, USA: Association for Computing Machinery. ISBN 978-1-4503-4232-2.
- Bayesian CART Model Search. Journal of the American Statistical Association, 93(443): 935–948. Publisher: Taylor & Francis.
- MurTree: optimal decision trees via Dynamic programming and search. The Journal of Machine Learning Research, 23(1): 26:1169–26:1215.
- A Bayesian CART Algorithm. Biometrika, 85(3): 363–377.
- The Botanical Beauty of Random Binary Trees. In International Symposium Graph Drawing and Network Visualization.
- The Taxicab Sampler: MCMC for Discrete Spaces with Application to Tree Models. Journal of Statistical Computation and Simulation, 1–22. Publisher: Taylor & Francis.
- Why do Tree-Based Models Still Outperform Deep Learning on Tabular Data?
- Itemset mining: A constraint programming perspective. Artificial Intelligence, 175(12): 1951–1983.
- Optimal Sparse Decision Trees. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc.
- Constructing optimal binary decision trees is NP-complete. Information Processing Letters, 5(1): 15–17.
- On Mixing Rates for Bayesian CART. ArXiv:2306.00126 [math, stat].
- Time Constrained DL8.5 Using Limited Discrepancy Search. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2022, Grenoble, France, September 19–23, 2022, Proceedings, Part V, 443–459. Berlin, Heidelberg: Springer-Verlag. ISBN 978-3-031-26418-4.
- Top-down particle filtering for bayesian decision trees. In Proceedings of the 30th international conference on international conference on machine learning - volume 28, ICML’13, III–280–III–288. JMLR.org. Place: Atlanta, GA, USA.
- Generalized and scalable optimal sparse decision trees. In Proceedings of the 37th International Conference on Machine Learning, volume 119 of ICML’20, 6150–6160. JMLR.org.
- Admissible heuristic search in and/or graphs. Theoretical Computer Science, 24(2): 207–219. Publisher: Elsevier.
- AND/OR graph heuristic search methods. Journal of the ACM, 32(1): 28–51.
- Nijssen, S. 2008. Bayes optimal classification for decision trees. In Proceedings of the 25th international conference on Machine learning, ICML ’08, 696–703. New York, NY, USA: Association for Computing Machinery. ISBN 978-1-60558-205-4.
- Mining optimal decision trees from itemset lattices. In Knowledge discovery and data mining.
- Pratola, M. T. 2016. Efficient Metropolis–Hastings Proposal Mechanisms for Bayesian Regression Tree Models. Bayesian Analysis, 11(3): 885–911. Publisher: International Society for Bayesian Analysis.
- Quinlan, J. R. 1986. Induction of Decision Trees. Machine Learning, 1(1): 81–106.
- Fair and optimal decision trees: A dynamic programming approach. In Koyejo, S.; Mohamed, S.; Agarwal, A.; Belgrave, D.; Cho, K.; and Oh, A., eds., Advances in neural information processing systems, volume 35, 38899–38911. Curran Associates, Inc.
- Compact-MDD: Efficiently Filtering (s)MDD Constraints with Reversible Sparse Bit-sets. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, 1383–1389. International Joint Conferences on Artificial Intelligence Organization.
- Learning optimal decision trees using constraint programming. Constraints, 25(3): 226–250.
- Learning optimal classification trees using a binary linear program formulation. In Proceedings of the thirty-third AAAI conference on artificial intelligence and thirty-first innovative applications of artificial intelligence conference and ninth AAAI symposium on educational advances in artificial intelligence, AAAI’19/IAAI’19/EAAI’19. AAAI Press. ISBN 978-1-57735-809-1. Place: Honolulu, Hawaii, USA Number of pages: 8 tex.articleno: 200.