Knowledge Distillation Decision Tree for Unravelling Black-box Machine Learning Models (2206.04661v4)
Abstract: Machine learning models, particularly the black-box models, are widely favored for their outstanding predictive capabilities. However, they often face scrutiny and criticism due to the lack of interpretability. Paradoxically, their strong predictive capabilities may indicate a deep understanding of the underlying data, implying significant potential for interpretation. Leveraging the emerging concept of knowledge distillation, we introduce the method of knowledge distillation decision tree (KDDT). This method enables the distillation of knowledge about the data from a black-box model into a decision tree, thereby facilitating the interpretation of the black-box model. Essential attributes for a good interpretable model include simplicity, stability, and predictivity. The primary challenge of constructing interpretable tree lies in ensuring structural stability under the randomness of the training data. KDDT is developed with the theoretical foundations demonstrating that structure stability can be achieved under mild assumptions. Furthermore, we propose the hybrid KDDT to achieve both simplicity and predictivity. An efficient algorithm is provided for constructing the hybrid KDDT. Simulation studies and a real-data analysis validate the hybrid KDDT's capability to deliver accurate and reliable interpretations. KDDT is an excellent interpretable model with great potential for practical applications.
- Towards understanding ensemble, knowledge distillation andself-distillation in deep learning.
- Do deep nets really need to be deep? In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Q.Weinberger (Eds.), Advances in Neural Information Processing Systems,Volume 27. Curran Associates, Inc.
- Breiman, L. (2001). Statistical Modeling: The Two Cultures (with comments and arejoinder by the author). Statistical Science 16(3), 199 – 231.
- Classification and Regression Trees. Chapman and Hall/CRC.
- Interpretable random forests via rule extraction.
- Distilling deep reinforcement learning policies in soft decisiontrees. In IJCAI 2019.
- Introduction to Algorithms, Third Edition (3rd ed.). The MIT Press.
- Cdt: Cascading decision trees for explainable reinforcement learning.
- Distilling a neural network into a soft decision tree.
- Knowledge distillation: A survey. International Journal of Computer Vision 129(6),1789–1819.
- Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531.
- Hyafil, L. and R. L. Rivest (1976). Constructing optimal binary decision trees is np-complete. Information Processing Letters 5(1), 15–17.
- One tree to explain them all. In 2011 IEEE Congress of Evolutionary Computation (CEC), pp.\1444–1451.
- Super learner. Statistical Applications in Genetics and MolecularBiology 6(1), 25.
- Tnt: An interpretable tree-network-tree learning framework usingknowledge distillation. Entropy 22(11).
- Tutorial in biostatistics: data-driven subgroup identification andanalysis in clinical trials. Statistics in Medicine 36(1), 136–196.
- Quinlan, J. R. (1993). C4.5: programs for machine learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
- Data Mining With Decision Trees: Theory and Applications (2nded.). USA: World Scientific Publishing Co., Inc.
- Reconciling predictive and interpretable performance in repeat buyerprediction via model distillation and heterogeneous classifiers fusion. Neural Comput. Appl..
- Knowledge distillation for recurrent neural network language modelingwith trust regularization.
- Tree-like decision distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision andPattern Recognition (CVPR), pp. 13488–13497.
- Does knowledge distillation really work?
- Do deep convolutional nets really need to be deep and convolutional?
- Wang, Y. and S.-T. Xia (2017). Unifying attribute splitting criteria of decision trees by tsallisentropy. In 2017 IEEE International Conference on Acoustics, Speech andSignal Processing (ICASSP), pp. 2507–2511.
- Approximation trees: Statistical stability in model distillation.