Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 94 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 31 tok/s
GPT-5 High 45 tok/s Pro
GPT-4o 104 tok/s
GPT OSS 120B 467 tok/s Pro
Kimi K2 206 tok/s Pro
2000 character limit reached

Inherently Interpretable Tree Ensemble Learning (2410.19098v1)

Published 24 Oct 2024 in stat.ML and cs.LG

Abstract: Tree ensemble models like random forests and gradient boosting machines are widely used in machine learning due to their excellent predictive performance. However, a high-performance ensemble consisting of a large number of decision trees lacks sufficient transparency and explainability. In this paper, we demonstrate that when shallow decision trees are used as base learners, the ensemble learning algorithms can not only become inherently interpretable subject to an equivalent representation as the generalized additive models but also sometimes lead to better generalization performance. First, an interpretation algorithm is developed that converts the tree ensemble into the functional ANOVA representation with inherent interpretability. Second, two strategies are proposed to further enhance the model interpretability, i.e., by adding constraints in the model training stage and post-hoc effect pruning. Experiments on simulations and real-world datasets show that our proposed methods offer a better trade-off between model interpretation and predictive performance, compared with its counterpart benchmarks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. L. Breiman, “Bagging predictors,” Machine Learning, vol. 24, pp. 123–140, 1996.
  2. ——, “Random forests,” Machine Learning, vol. 45, pp. 5–32, 2001.
  3. J. H. Friedman, “Greedy function approximation: a gradient boosting machine,” Annals of Statistics, pp. 1189–1232, 2001.
  4. T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 785–794.
  5. G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y. Liu, “Lightgbm: A highly efficient gradient boosting decision tree,” Advances in Neural Information Processing Systems, vol. 30, 2017.
  6. A. V. Dorogush, V. Ershov, and A. Gulin, “Catboost: gradient boosting with categorical features support,” arXiv preprint arXiv:1810.11363, 2018.
  7. C. J. Stone, “The use of polynomial splines and their tensor products in multivariate function estimation,” Annals of Statistics, vol. 22, no. 1, pp. 118–171, 1994.
  8. J. Z. Huang, “Projection estimation in multiple regression with application to functional anova models,” Annals of Statistics, vol. 26, no. 1, pp. 242–272, 1998.
  9. Y. Lou, R. Caruana, J. Gehrke, and G. Hooker, “Accurate intelligible models with pairwise interactions,” in Proceedings of the 19th ACM SIGKDD international conference on Knowledge Discovery and Data Mining.   ACM, 2013, pp. 623–631.
  10. D. W. Apley and J. Zhu, “Visualizing the effects of predictor variables in black box supervised learning models,” Journal of the Royal Statistical Society Series B: Statistical Methodology, vol. 82, no. 4, pp. 1059–1086, 2020.
  11. M. T. Ribeiro, S. Singh, and C. Guestrin, “”Why should I trust you?” explaining the predictions of any classifier,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 1135–1144.
  12. S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,” Advances in Neural Information Processing Systems, vol. 30, 2017.
  13. S. M. Lundberg, G. Erion, H. Chen, A. DeGrave, J. M. Prutkin, B. Nair, R. Katz, J. Himmelfarb, N. Bansal, and S.-I. Lee, “From local explanations to global understanding with explainable ai for trees,” Nature Machine Intelligence, vol. 2, no. 1, pp. 56–67, 2020.
  14. C. Rudin, “Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead,” Nature Machine Intelligence, vol. 1, no. 5, pp. 206–215, 2019.
  15. A. Sudjianto and A. Zhang, “Designing inherently interpretable machine learning models,” arXiv preprint arXiv:2111.01743, 2021.
  16. Z. Yang, A. Zhang, and A. Sudjianto, “GAMI-Net: An explainable neural network based on generalized additive models with structured interactions,” Pattern Recognition, vol. 120, p. 108192, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0031320321003484
  17. L. Hu, V. N. Nair, A. Sudjianto, A. Zhang, and J. Chen, “Interpretable machine learning based on functional anova framework: Algorithms and comparisons,” arXiv preprint arXiv:2305.15670, 2023.
  18. B. Lengerich, S. Tan, C.-H. Chang, G. Hooker, and R. Caruana, “Purifying interaction effects with the functional anova: An efficient algorithm for recovering identifiable additive models,” in International Conference on Artificial Intelligence and Statistics.   PMLR, 2020, pp. 2402–2412.
  19. J. J. Oliver and D. Hand, “Averaging over decision stumps,” in Machine Learning: ECML-94: European Conference on Machine Learning Catania, Italy, April 6–8, 1994 Proceedings 7.   Springer, 1994, pp. 231–241.
  20. D. G. Denison, “Boosting with bayesian stumps,” Statistics and Computing, vol. 11, no. 2, pp. 171–178, 2001.
  21. G. Hooker, “Generalized functional anova diagnostics for high-dimensional functions of dependent variables,” Journal of Computational and Graphical Statistics, vol. 16, no. 3, pp. 709–732, 2007.
  22. L. S. Shapley, “A value for n-person games,” Contributions to the Theory of Games, vol. 2, pp. 307–317, 1953.
  23. A. B. Owen, “Sobol’indices and shapley value,” SIAM/ASA Journal on Uncertainty Quantification, vol. 2, no. 1, pp. 245–251, 2014.
  24. J. H. Friedman and B. E. Popescu, “Predictive learning via rule ensembles,” Annals of Applied Statistics, vol. 2, no. 3, pp. 916 – 954, 2008. [Online]. Available: https://doi.org/10.1214/07-AOAS148
  25. G. Borboudakis and I. Tsamardinos, “Forward-backward selection with early dropping,” The Journal of Machine Learning Research, vol. 20, no. 1, pp. 276–314, 2019.
  26. D. Servén and C. Brummitt, “pygam: Generalized additive models in python,” Mar. 2018.
  27. H. Nori, S. Jenkins, P. Koch, and R. Caruana, “Interpretml: A unified framework for machine learning interpretability,” arXiv preprint arXiv:1909.09223, 2019.
  28. J. Bergstra and Y. Bengio, “Random search for hyper-parameter optimization,” Journal of Machine Learning Research, vol. 13, no. 2, 2012.
  29. J. H. Friedman, “Multivariate adaptive regression splines,” Annals of Statistics, vol. 19, no. 1, pp. 1–67, 1991.
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets