Inherently Interpretable Tree Ensemble Learning (2410.19098v1)
Abstract: Tree ensemble models like random forests and gradient boosting machines are widely used in machine learning due to their excellent predictive performance. However, a high-performance ensemble consisting of a large number of decision trees lacks sufficient transparency and explainability. In this paper, we demonstrate that when shallow decision trees are used as base learners, the ensemble learning algorithms can not only become inherently interpretable subject to an equivalent representation as the generalized additive models but also sometimes lead to better generalization performance. First, an interpretation algorithm is developed that converts the tree ensemble into the functional ANOVA representation with inherent interpretability. Second, two strategies are proposed to further enhance the model interpretability, i.e., by adding constraints in the model training stage and post-hoc effect pruning. Experiments on simulations and real-world datasets show that our proposed methods offer a better trade-off between model interpretation and predictive performance, compared with its counterpart benchmarks.
- L. Breiman, “Bagging predictors,” Machine Learning, vol. 24, pp. 123–140, 1996.
- ——, “Random forests,” Machine Learning, vol. 45, pp. 5–32, 2001.
- J. H. Friedman, “Greedy function approximation: a gradient boosting machine,” Annals of Statistics, pp. 1189–1232, 2001.
- T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 785–794.
- G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y. Liu, “Lightgbm: A highly efficient gradient boosting decision tree,” Advances in Neural Information Processing Systems, vol. 30, 2017.
- A. V. Dorogush, V. Ershov, and A. Gulin, “Catboost: gradient boosting with categorical features support,” arXiv preprint arXiv:1810.11363, 2018.
- C. J. Stone, “The use of polynomial splines and their tensor products in multivariate function estimation,” Annals of Statistics, vol. 22, no. 1, pp. 118–171, 1994.
- J. Z. Huang, “Projection estimation in multiple regression with application to functional anova models,” Annals of Statistics, vol. 26, no. 1, pp. 242–272, 1998.
- Y. Lou, R. Caruana, J. Gehrke, and G. Hooker, “Accurate intelligible models with pairwise interactions,” in Proceedings of the 19th ACM SIGKDD international conference on Knowledge Discovery and Data Mining. ACM, 2013, pp. 623–631.
- D. W. Apley and J. Zhu, “Visualizing the effects of predictor variables in black box supervised learning models,” Journal of the Royal Statistical Society Series B: Statistical Methodology, vol. 82, no. 4, pp. 1059–1086, 2020.
- M. T. Ribeiro, S. Singh, and C. Guestrin, “”Why should I trust you?” explaining the predictions of any classifier,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 1135–1144.
- S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,” Advances in Neural Information Processing Systems, vol. 30, 2017.
- S. M. Lundberg, G. Erion, H. Chen, A. DeGrave, J. M. Prutkin, B. Nair, R. Katz, J. Himmelfarb, N. Bansal, and S.-I. Lee, “From local explanations to global understanding with explainable ai for trees,” Nature Machine Intelligence, vol. 2, no. 1, pp. 56–67, 2020.
- C. Rudin, “Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead,” Nature Machine Intelligence, vol. 1, no. 5, pp. 206–215, 2019.
- A. Sudjianto and A. Zhang, “Designing inherently interpretable machine learning models,” arXiv preprint arXiv:2111.01743, 2021.
- Z. Yang, A. Zhang, and A. Sudjianto, “GAMI-Net: An explainable neural network based on generalized additive models with structured interactions,” Pattern Recognition, vol. 120, p. 108192, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0031320321003484
- L. Hu, V. N. Nair, A. Sudjianto, A. Zhang, and J. Chen, “Interpretable machine learning based on functional anova framework: Algorithms and comparisons,” arXiv preprint arXiv:2305.15670, 2023.
- B. Lengerich, S. Tan, C.-H. Chang, G. Hooker, and R. Caruana, “Purifying interaction effects with the functional anova: An efficient algorithm for recovering identifiable additive models,” in International Conference on Artificial Intelligence and Statistics. PMLR, 2020, pp. 2402–2412.
- J. J. Oliver and D. Hand, “Averaging over decision stumps,” in Machine Learning: ECML-94: European Conference on Machine Learning Catania, Italy, April 6–8, 1994 Proceedings 7. Springer, 1994, pp. 231–241.
- D. G. Denison, “Boosting with bayesian stumps,” Statistics and Computing, vol. 11, no. 2, pp. 171–178, 2001.
- G. Hooker, “Generalized functional anova diagnostics for high-dimensional functions of dependent variables,” Journal of Computational and Graphical Statistics, vol. 16, no. 3, pp. 709–732, 2007.
- L. S. Shapley, “A value for n-person games,” Contributions to the Theory of Games, vol. 2, pp. 307–317, 1953.
- A. B. Owen, “Sobol’indices and shapley value,” SIAM/ASA Journal on Uncertainty Quantification, vol. 2, no. 1, pp. 245–251, 2014.
- J. H. Friedman and B. E. Popescu, “Predictive learning via rule ensembles,” Annals of Applied Statistics, vol. 2, no. 3, pp. 916 – 954, 2008. [Online]. Available: https://doi.org/10.1214/07-AOAS148
- G. Borboudakis and I. Tsamardinos, “Forward-backward selection with early dropping,” The Journal of Machine Learning Research, vol. 20, no. 1, pp. 276–314, 2019.
- D. Servén and C. Brummitt, “pygam: Generalized additive models in python,” Mar. 2018.
- H. Nori, S. Jenkins, P. Koch, and R. Caruana, “Interpretml: A unified framework for machine learning interpretability,” arXiv preprint arXiv:1909.09223, 2019.
- J. Bergstra and Y. Bengio, “Random search for hyper-parameter optimization,” Journal of Machine Learning Research, vol. 13, no. 2, 2012.
- J. H. Friedman, “Multivariate adaptive regression splines,” Annals of Statistics, vol. 19, no. 1, pp. 1–67, 1991.
Collections
Sign up for free to add this paper to one or more collections.