Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

An improved column-generation-based matheuristic for learning classification trees (2308.11477v2)

Published 22 Aug 2023 in cs.LG, cs.AI, and math.OC

Abstract: Decision trees are highly interpretable models for solving classification problems in ML. The standard ML algorithms for training decision trees are fast but generate suboptimal trees in terms of accuracy. Other discrete optimization models in the literature address the optimality problem but only work well on relatively small datasets. \cite{firat2020column} proposed a column-generation-based heuristic approach for learning decision trees. This approach improves scalability and can work with large datasets. In this paper, we describe improvements to this column generation approach. First, we modify the subproblem model to significantly reduce the number of subproblems in multiclass classification instances. Next, we show that the data-dependent constraints in the master problem are implied, and use them as cutting planes. Furthermore, we describe a separation model to generate data points for which the linear programming relaxation solution violates their corresponding constraints. We conclude by presenting computational results that show that these modifications result in better scalability.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. Strong optimal classification trees. arXiv preprint arXiv:2103.15965 .
  2. Branch-and-price: Column generation for solving huge integer programs. Operations Research 46, 316–329.
  3. Optimal classification trees. Machine Learning 106, 1039–1082.
  4. Robust optimal classification trees under noisy labels. Advances in Data Analysis and Classification 16, 155–179.
  5. Multiclass optimal classification trees with svm-splits. Machine Learning , 1–24.
  6. Optimal randomized classification trees. Computers & Operations Research 132, 105281.
  7. Classification and regression trees. Wadsworth & Brooks / Cole Advanced Books and Software, Monterey, CA.
  8. Mathematical optimization in classification and regression trees. Top 29, 5–33.
  9. Margin optimal classification trees. Computers & Operations Research 161, 106441.
  10. UCI machine learning repository. URL: http://archive.ics.uci.edu/ml.
  11. Column generation based heuristic for learning classification trees. Computers & Operations Research 116, 104866.
  12. Ambros-Gleixner/mipcc23: The MIP workshop 2023 computational competition. URL: https://github.com/ambros-gleixner/MIPcc23.
  13. Google, 2021. OR-Tools. https://developers.google.com/optimization/.
  14. Optimal decision trees for categorical data via integer programming. Journal of Global Optimization 81, 233–260.
  15. Gurobi Optimization, 2021. Gurobi Optimizer Reference Manual. https://www.gurobi.com.
  16. Column generation based primal heuristics. Electronic Notes in Discrete Mathematics 36, 695–702.
  17. Constructing optimal binary decision trees is NP-complete. Information Processing Letters 5, 15–17.
  18. Learning optimal decision trees with SAT., in: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, pp. 1362–1368.
  19. Mining optimal decision trees from itemset lattices, in: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 530–539.
  20. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, 2825–2830.
  21. Induction of decision trees. Machine learning 1, 81–106.
  22. Primal heuristics for branch and price: The assets of diving methods. INFORMS Journal on Computing 31, 251–267.
  23. Learning optimal decision trees using constraint programming. Constraints 25, 226–250.
  24. Learning decision trees with flexible constraints and objectives using integer optimization, in: Salvagnin, D., Lombardi, M. (Eds.), Integration of AI and OR Techniques in Constraint Programming, CPAIOR 2017, Springer International Publishing, Cham. pp. 94–103.
  25. Learning optimal classification trees using a binary linear program formulation, in: Proceedings of the AAAI conference on artificial intelligence, pp. 1625–1632.
Citations (3)

Summary

We haven't generated a summary for this paper yet.