Unboxing Tree Ensembles for interpretability: a hierarchical visualization tool and a multivariate optimal re-built tree (2302.07580v2)
Abstract: The interpretability of models has become a crucial issue in Machine Learning because of algorithmic decisions' growing impact on real-world applications. Tree ensemble methods, such as Random Forests or XgBoost, are powerful learning tools for classification tasks. However, while combining multiple trees may provide higher prediction quality than a single one, it sacrifices the interpretability property resulting in "black-box" models. In light of this, we aim to develop an interpretable representation of a tree-ensemble model that can provide valuable insights into its behavior. First, given a target tree-ensemble model, we develop a hierarchical visualization tool based on a heatmap representation of the forest's feature use, considering the frequency of a feature and the level at which it is selected as an indicator of importance. Next, we propose a mixed-integer linear programming (MILP) formulation for constructing a single optimal multivariate tree that accurately mimics the target model predictions. The goal is to provide an interpretable surrogate model based on oblique hyperplane splits, which uses only the most relevant features according to the defined forest's importance indicators. The MILP model includes a penalty on feature selection based on their frequency in the forest to further induce sparsity of the splits. The natural formulation has been strengthened to improve the computational performance of {mixed-integer} software. Computational experience is carried out on benchmark datasets from the UCI repository using a state-of-the-art off-the-shelf solver. Results show that the proposed model is effective in yielding a shallow interpretable tree approximating the tree-ensemble decision function.
- A comparison among interpretative proposals for random forests. Machine Learning with Applications, 6:100094.
- Explainable ensemble trees. Computational Statistics, pages 1–17.
- Bennett, K. P. (1992). Decision tree construction via linear programming. In Proceedings of the 4th Midwest Artificial Intelligence and Cognitive Science Society Conference, pages 97–101.
- Optimal classification trees. Machine Learning, 106(7):1039–1082.
- Shattering inequalities for learning optimal decision trees. In Integration of Constraint Programming, Artificial Intelligence, and Operations Research: 19th International Conference, CPAIOR 2022, Los Angeles, CA, USA, June 20-23, 2022, Proceedings, pages 74–90. Springer.
- Breiman, L. (2001). Random forests. Machine Learning, 45(1):5–32.
- Random forests–classification manual (website accessed in 1/2008).
- Classification and Regression Trees. Chapman and Hall/CRC.
- Born again trees. University of California, Berkeley, Berkeley, CA, Technical Report, 1(2):4.
- A survey on the explainability of supervised machine learning. Journal of Artificial Intelligence Research, 70:245–317.
- Mathematical optimization in classification and regression trees. TOP, 29(1):5–33. Published online: 17. Marts 2021.
- XGBoost. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM.
- Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608.
- UCI machine learning repository.
- Margin optimal classification trees. Computers & Operations Research, 161:106441.
- Ehrlinger, J. (2016). ggRandomForests: Exploring random forest survival. preprint arXiv:11612.08974.
- Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5):1189 – 1232.
- Variable selection using random forests. Pattern recognition letters, 31(14):2225–2236.
- A survey of methods for explaining black box models. ACM computing surveys (CSUR), 51(5):1–42.
- Violin plots: a box plot-density trace synergism. The American Statistician, 52(2):181–184.
- Ho, T. K. (1998). The random subspace method for constructing decision forests. IEEE transactions on pattern analysis and machine intelligence, 20(8):832–844.
- Constructing optimal binary decision trees is NP-complete. Inf. Process. Lett., 5:15–17.
- Ishwaran, H. (2007). Variable importance in binary regression trees and forests. Electronic Journal of Statistics, 1:519 – 537.
- Random survival forests. The Annals of applied statistics, 2(3):841–860.
- High-dimensional variable selection for survival data. Journal of the American Statistical Association, 105(489):205–217.
- Classification and Regression by randomForest. R News, 2(3):18–22.
- Understanding variable importances in forests of randomized trees. Advances in neural information processing systems, 26.
- Margot, F. (2010). Symmetry in Integer Linear Programming, pages 647–686. Springer Berlin Heidelberg, Berlin, Heidelberg.
- Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830.
- Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1:81–106.
- Quinlan, J. R. (1993). C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
- Interpretable machine learning: Fundamental principles and 10 grand challenges. Statistics Surveys, 16:1–85.
- Surrogate minimal depth as an importance measure for variables in random forests. Bioinformatics, 35(19):3663–3671.
- On the boosting pruning problem. In Machine Learning: ECML 2000: 11th European Conference on Machine Learning Barcelona, Catalonia, Spain, May 31–June 2, 2000 Proceedings 11, pages 404–412. Springer.
- Tree space prototypes: Another look at making tree ensembles interpretable. In Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference, FODS ’20, page 23–34, New York, NY, USA. Association for Computing Machinery.
- Born-again tree ensembles. In International conference on machine learning, pages 9743–9753. PMLR.
- IForest: Interpreting random forests via visual analytics. IEEE Trans. Vis. Comput. Graph., 25(1):407–416.
- Ensembling neural networks: Many could be better than all. Artificial Intelligence, 137(1):239–263.
- Giulia Di Teodoro (6 papers)
- Marta Monaci (3 papers)
- Laura Palagi (17 papers)