Orthogonal Gradient Boosting for Simpler Additive Rule Ensembles (2402.15691v1)
Abstract: Gradient boosting of prediction rules is an efficient approach to learn potentially interpretable yet accurate probabilistic models. However, actual interpretability requires to limit the number and size of the generated rules, and existing boosting variants are not designed for this purpose. Though corrective boosting refits all rule weights in each iteration to minimise prediction risk, the included rule conditions tend to be sub-optimal, because commonly used objective functions fail to anticipate this refitting. Here, we address this issue by a new objective function that measures the angle between the risk gradient vector and the projection of the condition output vector onto the orthogonal complement of the already selected conditions. This approach correctly approximate the ideal update of adding the risk gradient itself to the model and favours the inclusion of more general and thus shorter rules. As we demonstrate using a wide range of prediction tasks, this significantly improves the comprehensibility/accuracy trade-off of the fitted ensemble. Additionally, we show how objective values for related rule conditions can be computed incrementally to avoid any substantial computational overhead of the new method.
- Interpretable random forests via rule extraction. In International Conference on Artificial Intelligence and Statistics, pages 937–945. PMLR, 2021.
- Better short than greedy: Interpretable models through optimal rule boosting. In Proceedings of the 2021 SIAM International Conference on Data Mining (SDM), pages 351–359. SIAM, 2021.
- T. Chen and C. Guestrin. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785–794, 2016.
- W. W. Cohen and Y. Singer. A simple, fast, and effective rule learner. AAAI/IAAI, 99(335-342):3, 1999.
- Boolean decision rules via column generation. Advances in neural information processing systems, 31, 2018.
- Ender: a statistical framework for boosting decision rules. Data Mining and Knowledge Discovery, 21(1):52–90, 2010.
- J. H. Friedman. Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189–1232, 2001.
- Predictive learning via rule ensembles. The annals of applied statistics, pages 916–954, 2008.
- T. Hastie and R. Tibshirani. Generalized Additive Models. Routledge, 1990. ISBN 9780203753781.
- J. Kivinen and M. K. Warmuth. Boosting as entropy projection. In Proceedings of the twelfth annual conference on Computational learning theory, pages 134–144, 1999.
- Shapley residuals: Quantifying the limits of the shapley value for explanations. Advances in Neural Information Processing Systems, 34:26598–26608, 2021.
- Interpretable decision sets: A joint framework for description and prediction. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1675–1684, 2016.
- Accurate intelligible models with pairwise interactions. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 623–631, 2013.
- Functional gradient techniques for combining hypotheses. Advances in Neural Information Processing Systems, pages 221–246, 1999.
- P. McCullagh and J. A. Nelder. Generalized linear models. Routledge, 2019.
- Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019.
- “Why should I trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016.
- C. Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019.
- Trading accuracy for sparsity in optimization problems with sparsity constraints. SIAM Journal on Optimization, 20(6):2807–2832, 2010.
- E. Strumbelj and I. Kononenko. An efficient explanation of individual classifications using game theory. The Journal of Machine Learning Research, 11:1–18, 2010.
- A bayesian framework for learning rule sets for interpretable classification. The Journal of Machine Learning Research, 18(1):2357–2393, 2017.
- Generalized linear rule models. In International Conference on Machine Learning, pages 6687–6696. PMLR, 2019.