Fast Interpretable Greedy-Tree Sums (2201.11931v3)
Abstract: Modern machine learning has achieved impressive prediction performance, but often sacrifices interpretability, a critical consideration in high-stakes domains such as medicine. In such settings, practitioners often use highly interpretable decision tree models, but these suffer from inductive bias against additive structure. To overcome this bias, we propose Fast Interpretable Greedy-Tree Sums (FIGS), which generalizes the CART algorithm to simultaneously grow a flexible number of trees in summation. By combining logical rules with addition, FIGS is able to adapt to additive structure while remaining highly interpretable. Extensive experiments on real-world datasets show that FIGS achieves state-of-the-art prediction performance. To demonstrate the usefulness of FIGS in high-stakes domains, we adapt FIGS to learn clinical decision instruments (CDIs), which are tools for guiding clinical decision-making. Specifically, we introduce a variant of FIGS known as G-FIGS that accounts for the heterogeneity in medical data. G-FIGS derives CDIs that reflect domain knowledge and enjoy improved specificity (by up to 20% over CART) without sacrificing sensitivity or interpretability. To provide further insight into FIGS, we prove that FIGS learns components of additive models, a property we refer to as disentanglement. Further, we show (under oracle conditions) that unconstrained tree-sum models leverage disentanglement to generalize more efficiently than single decision tree models when fitted to additive regression functions. Finally, to avoid overfitting with an unconstrained number of splits, we develop Bagging-FIGS, an ensemble version of FIGS that borrows the variance reduction techniques of random forests. Bagging-FIGS enjoys competitive performance with random forests and XGBoost on real-world datasets.
- Breiman L (2001) Random forests. Machine learning 45(1):5–32.
- Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Annals of statistics pp. 1189–1232.
- Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. pp. 785–794.
- nature 521(7553):436–444.
- Rudin C (2019) Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence 1(5):206–215.
- Proceedings of the National Academy of Sciences 116(44):22071–22080.
- Journal of Open Source Software 6(61):3192.
- (Chapman and Hall/CRC).
- Quinlan JR (2014) C4. 5: programs for machine learning. (Elsevier).
- Rudin C, et al. (2021) Interpretable machine learning: Fundamental principles and 10 grand challenges. arXiv preprint arXiv:2103.11251.
- Advances in Neural Information Processing Systems 34.
- Mignan A, Broccardo M (2019) One neuron versus deep learning in aftershock prediction. Nature 574(7776):E1–E3.
- arXiv preprint arXiv:2110.09626.
- arXiv preprint arXiv:2102.11800.
- Molnar C (2020) Interpretable machine learning. (Lulu. com).
- Yu B (2013) Stability. Bernoulli 19(4):1484–1500.
- Yu B, Kumbier K (2020) Veridical data science. Proceedings of the National Academy of Sciences 117(8):3920–3929.
- arXiv preprint arXiv:2202.00858.
- Quinlan JR (1986) Induction of decision trees. Machine learning 1(1):81–106.
- (PMLR), pp. 6150–6160.
- Advances in Neural Information Processing Systems (NeurIPS).
- Bertsimas D, Dunn J (2017) Optimal classification trees. Machine Learning 106(7):1039–1082.
- Pagallo G, Haussler D (1990) Boolean feature discovery in empirical learning. Machine learning 5(1):71–99.
- Annals of Applied Statistics 9(3):1350–1371.
- arXiv preprint arXiv:1704.01701.
- AAAI/IAAI 99(335-342):3.
- pp. 224–231.
- Caruana R, et al. (2015) Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. (ACM), pp. 1721–1730.
- Breiman L, Friedman JH (1985) Estimating optimal transformations for multiple regression and correlation. Journal of the American statistical Association 80(391):580–598.
- The Annals of Applied Statistics 2(3):916–954.
- Friedman JH (1991) Multivariate adaptive regression splines. The annals of statistics pp. 1–67.
- (Citeseer), Vol. 96, pp. 148–156.
- The Annals of Applied Statistics 4(1):266–298.
- Luna JM, et al. (2019) Building more accurate decision trees with the additive tree. Proceedings of the national academy of sciences 116(40):19887–19893.
- Lundberg SM, et al. (2019) Explainable ai for trees: From local explanations to global understanding. arXiv preprint arXiv:1905.04610.
- arXiv preprint arXiv:1905.07631.
- Rudin C (2018) Please stop explaining black box models for high stakes decisions. arXiv preprint arXiv:1811.10154.
- (PMLR), Vol. 97, pp. 6505–6514.
- Asuncion A, Newman D (2007) Uci machine learning repository.
- Romano JD, et al. (2020) Pmlb v1. 0: an open source dataset collection for benchmarking machine learning methods. arXiv preprint arXiv:2012.00058.
- Yeh IC, Lien Ch (2009) The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems with Applications 36(2):2473–2480.
- Osofsky JD (1997) The effects of exposure to violence on young children (1995). Carnegie Corporation of New York Task Force on the Needs of Young Children; An earlier version of this article was presented as a position paper for the aforementioned corporation.
- (American Medical Informatics Association), p. 261.
- Pace RK, Barry R (1997) Sparse spatial autoregressions. Statistics & Probability Letters 33(3):291–297.
- Sea Fisheries Division, Technical Report 48:p411.
- The Annals of statistics 32(2):407–499.
- Journal of Pediatric Surgery 54(11):2353–2357.
- Stiell IG, et al. (2001) The canadian ct head rule for patients with minor head injury. The Lancet 357(9266):1391–1396.
- Kornblith AE, et al. (2022) Predictability and stability testing to assess clinical decision instrument performance for children after blunt torso trauma. medRxiv.
- Holmes JF, et al. (2002) Identification of children with intra-abdominal injuries after blunt trauma. Annals of emergency medicine 39(5):500–509.
- Kuppermann N, et al. (2009) Identification of children at very low risk of clinically-important brain injuries after head trauma: a prospective cohort study. The Lancet 374(9696):1160–1170.
- Advances in neural information processing systems 31.
- Leonard JC, et al. (2019) Cervical spine injury risk factors in children with blunt trauma. Pediatrics 144(1).
- Journal of Machine Learning Research 13(2).
- Breiman L (1996) Bagging predictors. Machine learning 24:123–140.
- Bühlmann P, Yu B (2002) Analyzing bagging. The annals of Statistics 30(4):927–961.
- Mentch L, Zhou S (2020) Randomization as regularization: A degrees of freedom explanation for random forest success. The Journal of Machine Learning Research 21(1):6918–6953.
- (PMLR), pp. 3525–3535.
- Zenodo.
- Schwarz G (1978) Estimating the dimension of a model. The annals of statistics pp. 461–464.
- Lim C, Yu B (2016) Estimation stability with cross-validation (escv). Journal of Computational and Graphical Statistics 25(2):464–492.
- Proceedings of the National Academy of Sciences 109(4):1193–1198.
- Proceedings of the National Academy of Sciences p. 201711236.
- arXiv preprint arXiv:1810.07287.
- Annals of emergency medicine 62(2):107–116.
- Klusowski JM (2021) Universal consistency of decision trees in high dimensions. arXiv preprint arXiv:2104.13881.
- The Lancet 298(7716):125–128.
- Meyer, Jr CD (1973) Generalized inversion of modified matrices. Siam journal on applied mathematics 24(3):315–323.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.