Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning accurate and interpretable decision trees (2405.15911v1)

Published 24 May 2024 in cs.LG

Abstract: Decision trees are a popular tool in machine learning and yield easy-to-understand models. Several techniques have been proposed in the literature for learning a decision tree classifier, with different techniques working well for data from different domains. In this work, we develop approaches to design decision tree learning algorithms given repeated access to data from the same domain. We propose novel parameterized classes of node splitting criteria in top-down algorithms, which interpolate between popularly used entropy and Gini impurity based criteria, and provide theoretical bounds on the number of samples needed to learn the splitting function appropriate for the data at hand. We also study the sample complexity of tuning prior parameters in Bayesian decision tree learning, and extend our results to decision tree regression. We further consider the problem of tuning hyperparameters in pruning the decision tree for classical pruning algorithms including min-cost complexity pruning. We also study the interpretability of the learned decision trees and introduce a data-driven approach for optimizing the explainability versus accuracy trade-off using decision trees. Finally, we demonstrate the significance of our approach on real world datasets by learning data-specific decision trees which are simultaneously more accurate and interpretable.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (63)
  1. Neural network learning: Theoretical foundations. Cambridge University Press, 1999.
  2. Maria-Florina Balcan. Data-Driven Algorithm Design (book chapter). In Beyond Worst Case Analysis of Algorithms, Tim Roughgarden (Ed). Cambridge University Press, 2020.
  3. Trading accuracy for simplicity in decision trees. Machine Learning, 15:223–250, 1994.
  4. Random search for hyper-parameter optimization. Journal of Machine Learning Research (JMLR), 13(2), 2012.
  5. An analysis of robustness of non-Lipschitz networks. Journal of Machine Learning Research (JMLR), 24(98):1–43, 2023.
  6. How much data is sufficient to learn high-performing algorithms? Generalization guarantees for data-driven algorithm design. In Symposium on Theory of Computing (STOC), pages 919–932, 2021.
  7. Learning complexity of simulated annealing. In International Conference on Artificial Intelligence and Statistics (AISTATS), pages 1540–1548. PMLR, 2021.
  8. Learning to branch. In International Conference on Machine Learning (ICML), pages 344–353. PMLR, 2018.
  9. Learning to branch: Generalization guarantees and limits of data-independent discretization. Journal of the ACM (JACM), 2023.
  10. Data-driven clustering via parameterized Lloyd’s families. Advances in Neural Information Processing Systems (NeurIPS), 31, 2018.
  11. Classification and regression trees. Statistics/probability series. Wadsworth Publishing Company, 1984.
  12. Generalization bounds for data-driven numerical linear algebra. In Conference on Learning Theory (COLT), pages 2013–2040. PMLR, 2022.
  13. Learning-to-learn non-convex piecewise-Lipschitz functions. Advances in Neural Information Processing Systems (NeurIPS), 34:15056–15069, 2021.
  14. Provably tuning the ElasticNet across instances. Advances in Neural Information Processing Systems (NeurIPS), 35:27769–27782, 2022.
  15. New bounds for hyperparameter tuning of regression problems across instances. Advances in Neural Information Processing Systems (NeurIPS), 36, 2023.
  16. Sample complexity of tree search configuration: Cutting planes and beyond. Advances in Neural Information Processing Systems (NeurIPS), 34:4015–4027, 2021.
  17. Improved sample complexity bounds for branch-and-cut. In International Conference on Principles and Practice of Constraint Programming (CP), 2022.
  18. Data driven semi-supervised learning. Advances in Neural Information Processing Systems (NeurIPS), 34:14782–14794, 2021.
  19. Bayesian CART model search. Journal of the American Statistical Association (JASA), 93(443):935–948, 1998.
  20. Bayesian treed models. Machine Learning, 48:299–320, 2002.
  21. Path consistency learning in Tsallis entropy regularized MDPs. In International Conference on Machine Learning (ICML), pages 979–988. PMLR, 2018.
  22. Seeds. UCI Machine Learning Repository, 2012.
  23. Murtree: Optimal decision trees via dynamic programming and search. Journal of Machine Learning Research (JMLR), 23(1):1169–1215, 2022.
  24. Splitting with confidence in decision trees with application to stream mining. In International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2015.
  25. A comparative analysis of methods for pruning decision trees. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 19(5):476–491, 1997.
  26. The effects of pruning methods on the predictive accuracy of induced decision trees. Applied Stochastic Models in Business and Industry (ASMBI), 15(4):277–299, 1999.
  27. Ronald A Fisher. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2):179–188, 1936.
  28. B. German. Glass Identification. UCI Machine Learning Repository, 1987.
  29. A PAC approach to application-specific algorithm selection. In Innovations in Theoretical Computer Science (ITCS), pages 123–134, 2016.
  30. Optimal sparse decision trees. Advances in Neural Information Processing Systems (NeurIPS), 32, 2019.
  31. Cryotherapy Dataset . UCI Machine Learning Repository, 2018.
  32. On the boosting ability of top-down decision tree learning algorithms. In Symposium on Theory of Computing (STOC), pages 459–468, 1996.
  33. Meta-learning adversarial bandit algorithms. Advances in Neural Information Processing Systems (NeurIPS), 36, 2023.
  34. Moshe Lichman et al. UCI machine learning repository, 2013.
  35. Octavio Loyola-Gonzalez. Black-box vs. white-box: Understanding their advantages and weaknesses from a practical point of view. IEEE Access, 7:154096–154113, 2019.
  36. Discovering knowledge in data: An introduction to data mining, volume 4. John Wiley & Sons, 2014.
  37. Volker Lohweg. Banknote authentication. UCI Machine Learning Repository, 2013.
  38. Generalized and scalable optimal sparse decision trees. In International Conference on Machine Learning (ICML), pages 6150–6160. PMLR, 2020.
  39. Yishay Mansour. Pessimistic decision tree pruning based on tree size. In International Conference on Machine Learning (ICML), pages 195–201, 1997.
  40. Pascal Massart. Some applications of concentration inequalities to statistics. In Annales de la Faculté des sciences de Toulouse: Mathématiques, volume 9, pages 245–303, 2000.
  41. Nimrod Megiddo. Combinatorial optimization with rational objective functions. In Symposium on Theory of Computing (STOC), pages 1–12, 1978.
  42. John Mingers. Expert systems—rule induction with statistical data. Journal of the Operational Research Society (JORS), 38:39–47, 1987.
  43. John Mingers. An empirical comparison of pruning methods for decision tree induction. Machine Learning, 4:227–243, 1989.
  44. John Mingers. An empirical comparison of selection measures for decision-tree induction. Machine learning, 3:319–342, 1989.
  45. Christoph Molnar. Interpretable Machine Learning. 2019.
  46. Data mining with decision trees: theory and applications, volume 81. World scientific, 2014.
  47. On the pseudo-dimension of nearly optimal auctions. Advances in Neural Information Processing Systems (NeurIPS), 28, 2015.
  48. Foundations of Machine Learning. MIT press, 2018.
  49. Sreerama K Murthy. Automatic construction of decision trees from data: A multi-disciplinary survey. Data mining and knowledge discovery, 2:345–389, 1998.
  50. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research (JMLR), 12:2825–2830, 2011.
  51. J. Ross Quinlan. Induction of decision trees. Machine Learning, 1:81–106, 1986.
  52. J. Ross Quinlan. Simplifying decision trees. International Journal of Man-Machine Studies (IJMMS), 27(3):221–234, 1987.
  53. J Ross Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
  54. J. Ross Quinlan. Learning decision tree classifiers. ACM Computing Surveys (CSUR), 28(1):71–72, 1996.
  55. Human Activity Recognition Using Smartphones. UCI Machine Learning Repository, 2012.
  56. Cynthia Rudin. Please stop explaining black box models for high stakes decisions. Stat, 1050:26, 2018.
  57. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019.
  58. Constantino Tsallis. Possible generalization of Boltzmann-Gibbs statistics. Journal of Statistical Physics, 52:479–487, 1988.
  59. Machine learning techniques to diagnose breast cancer from image-processed nuclear features of fine needle aspirates. Cancer Letters, 77(2-3):163–171, 1994.
  60. Improving decision trees by Tsallis entropy information metric method. In International Joint Conference on Neural Networks (IJCNN), pages 4729–4734. IEEE, 2016.
  61. Bayesian CART: Prior specification and posterior simulation. Journal of Computational and Graphical Statistics (JCGS), 16(1):44–66, 2007.
  62. Tweedie gradient boosting for extremely unbalanced zero-inflated data. Communications in Statistics-Simulation and Computation, 51(9):5507–5529, 2022.
  63. Tsallis-inf: An optimal algorithm for stochastic and adversarial bandits. Journal of Machine Learning Research (JMLR), 22(1):1310–1358, 2021.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets