Learning Interpretable Rules for Scalable Data Representation and Classification (2310.14336v3)
Abstract: Rule-based models, e.g., decision trees, are widely used in scenarios demanding high model interpretability for their transparent inner structures and good model expressivity. However, rule-based models are hard to optimize, especially on large data sets, due to their discrete parameters and structures. Ensemble methods and fuzzy/soft rules are commonly used to improve performance, but they sacrifice the model interpretability. To obtain both good scalability and interpretability, we propose a new classifier, named Rule-based Representation Learner (RRL), that automatically learns interpretable non-fuzzy rules for data representation and classification. To train the non-differentiable RRL effectively, we project it to a continuous space and propose a novel training method, called Gradient Grafting, that can directly optimize the discrete model using gradient descent. A novel design of logical activation functions is also devised to increase the scalability of RRL and enable it to discretize the continuous features end-to-end. Exhaustive experiments on ten small and four large data sets show that RRL outperforms the competitive interpretable approaches and can be easily adjusted to obtain a trade-off between classification accuracy and model complexity for different scenarios. Our code is available at: https://github.com/12wang3/rrl.
- F. Doshi-Velez and B. Kim, “Towards a rigorous science of interpretable machine learning,” arXiv preprint arXiv:1702.08608, 2017.
- Z. C. Lipton, “The mythos of model interpretability,” Commun. ACM, vol. 61, no. 10, pp. 36–43, 2018.
- L. Chu, X. Hu, J. Hu, L. Wang, and J. Pei, “Exact and consistent interpretation for piecewise linear neural networks: A closed form solution,” in SIGKDD, 2018, pp. 1244–1253.
- W. J. Murdoch, C. Singh, K. Kumbier, R. Abbasi-Asl, and B. Yu, “Interpretable machine learning: definitions, methods, and applications,” PNAS, vol. 116, no. 44, pp. 22 071–22 080, 2019.
- G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y. Liu, “Lightgbm: A highly efficient gradient boosting decision tree,” in NeurIPS, 2017, pp. 3146–3154.
- L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp. 5–32, 2001.
- O. Irsoy, O. T. Yıldız, and E. Alpaydın, “Soft decision trees,” in ICPR, 2012, pp. 1819–1822.
- B. Letham, C. Rudin, T. H. McCormick, D. Madigan et al., “Interpretable classifiers using rules and bayesian analysis: Building a better stroke prediction model,” The Annals of Applied Statistics, vol. 9, no. 3, pp. 1350–1371, 2015.
- T. Wang, C. Rudin, F. Doshi-Velez, Y. Liu, E. Klampfl, and P. MacNeille, “A bayesian framework for learning rule sets for interpretable classification,” JMLR, vol. 18, no. 1, pp. 2357–2393, 2017.
- H. Yang, C. Rudin, and M. Seltzer, “Scalable bayesian rule lists,” in ICML, 2017, pp. 3921–3930.
- N. Frosst and G. Hinton, “Distilling a neural network into a soft decision tree,” arXiv preprint arXiv:1711.09784, 2017.
- M. T. Ribeiro, S. Singh, and C. Guestrin, “Why should i trust you?: Explaining the predictions of any classifier,” in SIGKDD, 2016, pp. 1135–1144.
- Z. Wang, W. Zhang, N. Liu, and J. Wang, “Transparent classification with multilayer logical perceptrons and random binarization,” in AAAI, 2020, pp. 6331–6339.
- M. Courbariaux, Y. Bengio, and J.-P. David, “Binaryconnect: Training deep neural networks with binary weights during propagations,” in NeurIPS, 2015, pp. 3123–3131.
- W. W. Cohen, “Fast effective rule induction,” in MLP. Elsevier, 1995, pp. 115–123.
- D. Wei, S. Dash, T. Gao, and O. Gunluk, “Generalized linear rule models,” in ICML. PMLR, 2019, pp. 6687–6696.
- E. Angelino, N. Larus-Stone, D. Alabi, M. Seltzer, and C. Rudin, “Learning certifiably optimal rule lists for categorical data,” JMLR, vol. 18, no. 1, pp. 8753–8830, 2017.
- J. Lin, C. Zhong, D. Hu, C. Rudin, and M. Seltzer, “Generalized and scalable optimal sparse decision trees,” in ICML. PMLR, 2020, pp. 6150–6160.
- H. Lakkaraju, S. H. Bach, and J. Leskovec, “Interpretable decision sets: A joint framework for description and prediction,” in SIGKDD, 2016, pp. 1675–1684.
- T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in SIGKDD, 2016, pp. 785–794.
- S. Hara and K. Hayashi, “Making tree ensembles interpretable: A bayesian model selection approach,” in AISTATS, 2018, pp. 77–85.
- H. Ishibuchi and T. Yamamoto, “Rule weight specification in fuzzy rule-based classification systems,” TFS, vol. 13, no. 4, pp. 428–435, 2005.
- Y. Yang, I. G. Morillo, and T. M. Hospedales, “Deep neural decision trees,” arXiv preprint arXiv:1806.06988, 2018.
- C. Glanois, Z. Jiang, X. Feng, P. Weng, M. Zimmer, D. Li, W. Liu, and J. Hao, “Neuro-symbolic hierarchical rule induction,” in ICML, 2022, pp. 7583–7615.
- K. Cheng, J. Liu, W. Wang, and Y. Sun, “Rlogic: Recursive logical rule learning from knowledge graphs,” in SIGKDD, 2022, pp. 179–189.
- M. Zimmer, X. Feng, C. Glanois, Z. JIANG, J. Zhang, P. Weng, D. Li, J. HAO, and W. Liu, “Differentiable logic machines,” TMLR, 2023.
- S. Chaudhury, S. Swaminathan, D. Kimura, P. Sen, K. Murugesan, R. Uceda-Sosa, M. Tatsubori, A. Fokoue, P. Kapanipathi, A. Munawar, and A. Gray, “Learning symbolic rules over Abstract Meaning Representations for textual reinforcement learning,” in ACL, 2023, pp. 6764–6776.
- X. Duan, X. Wang, P. Zhao, G. Shen, and W. Zhu, “Deeplogic: Joint learning of neural perception and logical reasoning,” TPAMI, vol. 45, no. 4, pp. 4321–4334, 2023.
- Z.-H. Zhou and Y.-X. Huang, “Abductive learning,” in Neuro-Symbolic Artificial Intelligence: The State of the Art. IOS Press, 2021, pp. 353–369.
- W. Dai, Q. Xu, Y. Yu, and Z. Zhou, “Bridging machine learning and logical reasoning by abductive learning,” pp. 2811–2822, 2019.
- Q. Zhang, J. Ren, G. Huang, R. Cao, Y. N. Wu, and S.-C. Zhu, “Mining interpretable aog representations from convolutional networks via active question answering,” TPAMI, vol. 43, no. 11, pp. 3949–3963, 2020.
- B. Liu and R. Mazumder, “Fire: An optimization approach for fast interpretable rule extraction,” in SIGKDD, 2023, p. 1396–1405.
- I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, “Binarized neural networks,” in NeurIPS, 2016, pp. 4107–4115.
- Y. Bai, Y.-X. Wang, and E. Liberty, “Proxquant: Quantized neural networks via proximal operators,” arXiv preprint arXiv:1810.00861, 2018.
- E. Jang, S. Gu, and B. Poole, “Categorical reparameterization with gumbel-softmax,” 2017.
- A. Payani and F. Fekri, “Learning algorithms via neural logic networks,” arXiv preprint arXiv:1904.01554, 2019.
- Y. Bengio, P. Simard, and P. Frasconi, “Learning long-term dependencies with gradient descent is difficult,” TNNS, vol. 5, no. 2, pp. 157–166, 1994.
- M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard et al., “{{\{{TensorFlow}}\}}: a system for {{\{{Large-Scale}}\}} machine learning,” in OSDI, 2016, pp. 265–283.
- A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “Pytorch: An imperative style, high-performance deep learning library,” in NeurIPS, 2019, pp. 8026–8037.
- H. Qin, R. Gong, X. Liu, X. Bai, J. Song, and N. Sebe, “Binary neural networks: A survey,” Pattern Recognition, p. 107281, 2020.
- G. Hinton, O. Vinyals, J. Dean et al., “Distilling the knowledge in a neural network,” arXiv preprint arXiv:1503.02531, vol. 2, no. 7, 2015.
- D. Dua and C. Graff, “UCI machine learning repository,” 2017. [Online]. Available: http://archive.ics.uci.edu/ml
- H. Xiao, K. Rasul, and R. Vollgraf, “Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms,” 2017.
- D. Anguita, A. Ghio, L. Oneto, X. Parra, J. L. Reyes-Ortiz et al., “A public domain dataset for human activity recognition using smartphones.” in Esann, vol. 3, 2013, p. 3.
- B. Rozemberczki, C. Allen, and R. Sarkar, “Multi-Scale attributed node embedding,” Journal of Complex Networks, vol. 9, no. 2, p. cnab014, 05 2021.
- R. C. Petersen, P. Aisen, L. A. Beckett, M. Donohue, A. Gamst, D. J. Harvey, C. Jack, W. Jagust, L. Shaw, A. Toga et al., “Alzheimer’s disease neuroimaging initiative (adni): clinical characterization,” Neurology, vol. 74, no. 3, pp. 201–209, 2010.
- Z. Wang, J. Wang, N. Liu, C. Liu, X. Li, L. Dong, R. Zhang, C. Mao, Z. Duan, W. Zhang et al., “Learning cognitive-test-based interpretable rules for prediction and early diagnosis of dementia using neural networks,” Journal of Alzheimer’s Disease, vol. 90, no. 2, pp. 609–624.
- J. Demšar, “Statistical comparisons of classifiers over multiple data sets,” JMLR, vol. 7, no. Jan, pp. 1–30, 2006.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” 2015.
- Y. Gorishniy, I. Rubachev, V. Khrulkov, and A. Babenko, “Revisiting deep learning models for tabular data,” in NeurIPS, 2021, pp. 18 932–18 943.
- G. Somepalli, M. Goldblum, A. Schwarzschild, C. B. Bruss, and T. Goldstein, “Saint: Improved neural networks for tabular data via row attention and contrastive pre-training,” arXiv preprint arXiv:2106.01342, 2021.
- V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in ICML, 2010, pp. 807–814.