Papers
Topics
Authors
Recent
Search
2000 character limit reached

Margin Optimal Classification Trees

Published 19 Oct 2022 in math.OC and cs.LG | (2210.10567v5)

Abstract: In recent years, there has been growing attention to interpretable machine learning models which can give explanatory insights on their behaviour. Thanks to their interpretability, decision trees have been intensively studied for classification tasks and, due to the remarkable advances in mixed integer programming (MIP), various approaches have been proposed to formulate the problem of training an Optimal Classification Tree (OCT) as a MIP model. We present a novel mixed integer quadratic formulation for the OCT problem, which exploits the generalization capabilities of Support Vector Machines for binary classification. Our model, denoted as Margin Optimal Classification Tree (MARGOT), encompasses maximum margin multivariate hyperplanes nested in a binary tree structure. To enhance the interpretability of our approach, we analyse two alternative versions of MARGOT, which include feature selection constraints inducing sparsity of the hyperplanes' coefficients. First, MARGOT has been tested on non-linearly separable synthetic datasets in a 2-dimensional feature space to provide a graphical representation of the maximum margin approach. Finally, the proposed models have been tested on benchmark datasets from the UCI repository. The MARGOT formulation turns out to be easier to solve than other OCT approaches, and the generated tree better generalizes on new observations. The two interpretable versions effectively select the most relevant features, maintaining good prediction quality.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Learning optimal and fair decision trees for non-discriminative decision-making. CoRR, abs/1903.10598.
  2. Strong optimal classification trees. CoRR, abs/2103.15965.
  3. Learning optimal decision trees using caching branch-and-bound search. Proceedings of the AAAI Conference on Artificial Intelligence, 34(04):3146–3153.
  4. On multivariate randomized classification trees: l0-based sparsity, vc dimension and decomposition methods. Computers & Operations Research, 151:106058.
  5. A support vector machine approach to decision trees. 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227), 3:2396–2401 vol.3.
  6. Optimal classification trees. Machine Learning, 106(7):1039–1082.
  7. Bixby, R. E. (2012). A brief history of linear and mixed-integer programming computation. Documenta Mathematica, Extra Volume: Optimization Stories(2012):107–121.
  8. A mathematical programming approach to binary supervised classification with label noise. CoRR, abs/2004.10170.
  9. Robust optimal classification trees under noisy labels. Advances in Data Analysis and Classification, 16(1):155–179.
  10. Multiclass optimal classification trees with svm-splits. Machine Learning, pages 1–24.
  11. Sparsity in optimal randomized classification trees. European Journal of Operational Research, 284(1):255–272.
  12. Optimal randomized classification trees. Computers &\&& Operations Research, 132:105281.
  13. Shattering inequalities for learning optimal decision trees. In Schaus, P., editor, Integration of Constraint Programming, Artificial Intelligence, and Operations Research, pages 74–90, Cham. Springer International Publishing.
  14. Massive data discrimination via linear support vector machines. Optimization methods and software, 13(1):1–10.
  15. Breiman, L. (2001). Random forests. Machine Learning, 45:5–32.
  16. Classification and Regression Trees. Chapman and Hall/CRC.
  17. Multivariate decision trees. Machine Learning, 19:45–77.
  18. Uniqueness of the SVM solution. Advances in neural information processing systems, 12.
  19. Mathematical optimization in classification and regression trees. TOP, 29(1):5–33. Published online: 17. Marts 2021.
  20. Detecting relevant variables and interactions in supervised classification. European Journal of Operational Research, 213(1):260–269.
  21. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
  22. XGBoost. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM.
  23. Support-vector networks. Machine Learning, 20(3):273–297.
  24. UCI machine learning repository.
  25. Liblinear: A library for large linear classification. the Journal of machine Learning research, 9:1871–1874.
  26. Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5):1189–1232.
  27. Optimization problems for machine learning: A survey. European Journal of Operational Research, 290(3):807–828.
  28. Optimal decision trees for categorical data via integer programming. Journal of Global Optimization, 81:233–260.
  29. Smoothed hinge loss and ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT support vector machines. In 2018 IEEE International Conference on Data Mining Workshops (ICDMW), pages 1217–1223. IEEE.
  30. Building projectable classifiers of arbitrary complexity. In Proceedings of 13th International Conference on Pattern Recognition, volume 2, pages 880–885. IEEE.
  31. Constructing optimal binary decision trees is NP-complete. Information Processing Letters, 5(1):15–17.
  32. A novel embedded min-max approach for feature selection in nonlinear support vector machine classification. European Journal of Operational Research, 293(1):24–35.
  33. Mixed integer linear programming for feature selection in support vector machine. Discrete Applied Mathematics, 261:276–304. GO X Meeting, Rigi Kaltbad (CH), July 10–14, 2016.
  34. A mixed integer linear programming support vector machine for cost-effective group feature selection: Branch-cut-and-price approach. European Journal of Operational Research, 299(3):1055–1068.
  35. Generalized and scalable optimal sparse decision trees. In III, H. D. and Singh, A., editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 6150–6160. PMLR.
  36. Feature selection for support vector machines via mixed integer linear programming. Information Sciences, 279:163–175.
  37. Exact 1-norm support vector machines via unconstrained convex differentiable minimization. Journal of Machine Learning Research, 7(7).
  38. A system for induction of oblique decision trees. Journal of Artificial Intelligence Research, 2:1–32.
  39. Multivariate classification trees based on minimum features discrete support vector machines. IMA Journal of Management Mathematics, 14(3):221–234.
  40. Nonlinear optimization and support vector machines. 4OR, 16:111–149.
  41. Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1:81–106.
  42. Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
  43. Interpretable machine learning: Fundamental principles and 10 grand challenges. Statistics Surveys, 16:1 – 85.
  44. Vapnik, V. (1999). The nature of statistical learning theory. Springer science & business media.
  45. Learning decision trees with flexible constraints and objectives using integer optimization. In CPAIOR.
  46. Learning optimal classification trees using a binary linear program formulation. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01):1625–1632.
  47. Wang, L. (2005). Support vector machines: Theory and applications. Studies in fuzziness and soft computing, v 177, 302.
  48. The doubly regularized support vector machine. Statistica Sinica, 16(2):589–615.
  49. Hhcart: An oblique decision tree. Computational Statistics &\&& Data Analysis, 96:12–23.
Citations (7)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.