Properly Learning Decision Trees with Queries Is NP-Hard (2307.04093v1)
Abstract: We prove that it is NP-hard to properly PAC learn decision trees with queries, resolving a longstanding open problem in learning theory (Bshouty 1993; Guijarro-Lavin-Raghavan 1999; Mehta-Raghavan 2002; Feldman 2016). While there has been a long line of work, dating back to (Pitt-Valiant 1988), establishing the hardness of properly learning decision trees from random examples, the more challenging setting of query learners necessitates different techniques and there were no previous lower bounds. En route to our main result, we simplify and strengthen the best known lower bounds for a different problem of Decision Tree Minimization (Zantema-Bodlaender 2000; Sieling 2003). On a technical level, we introduce the notion of hardness distillation, which we study for decision tree complexity but can be considered for any complexity measure: for a function that requires large decision trees, we give a general method for identifying a small set of inputs that is responsible for its complexity. Our technique even rules out query learners that are allowed constant error. This contrasts with existing lower bounds for the setting of random examples which only hold for inverse-polynomial error. Our result, taken together with a recent almost-polynomial time query algorithm for properly learning decision trees under the uniform distribution (Blanc-Lange-Qiao-Tan 2022), demonstrates the dramatic impact of distributional assumptions on the problem.
- The complexity of properly learning simple concept classes. Journal of Computer & System Sciences, 74(1):16–34, 2009. Preliminary version in FOCS 2004.
- Approximating optimal binary decision trees. Algorithmica, 62(3-4):1112–1121, 2012.
- Proof verification and the hardness of approximation problems. J. ACM, 45(3):501–555, may 1998.
- Dana Angluin. Remarks on the difficulty of finding a minimal disjunctive normal form for boolean functions. Unpublished Manuscript.
- Dana Angluin. Queries and concept learning. Machine learning, 2:319–342, 1988.
- Probabilistic checking of proofs: A new characterization of NP. J. ACM, 45(1):70–122, jan 1998.
- On the proper learning of axis-parallel concepts. The Journal of Machine Learning Research, 4:157–176, 2003.
- Learnability and the Vapnik-Chervonenkis dimension. Journal of the ACM (JACM), 36(4):929–965, 1989.
- Weakly learning DNF and characterizing statistical query learning using Fourier analysis. In Proceedings of the 26th Annual ACM Symposium on Theory of Computing (STOC), pages 253–262, 1994.
- Interpretability via model extraction. In Proceedings of the 4th Workshop on Fairness, Accountability, and Transparency in Machine Learning (FAT/ML), 2017.
- Decision tree heuristics can fail, even in the smoothed setting. In Proceedings of the 25th International Conference on Randomization and Computation (RANDOM), volume 207, pages 45:1–45:16, 2021.
- Properly learning decision trees in almost polynomial time. Journal of the ACM (JACM), 69(6):39:1–39:19, 2022.
- Leo Breiman. Statistical modeling: The two cultures (with comments and a rejoinder by the author). Statistical science, 16(3):199–231, 2001.
- Born again trees. Technical report, University of California, Berkeley, 1996.
- Nader Bshouty. Exact learning via the monotone theory. In Proceedings of 34th Annual Symposium on Foundations of Computer Science (FOCS), pages 302–311, 1993.
- Nader H. Bshouty. Superpolynomial lower bounds for learning monotone classes. Electron. Colloquium Comput. Complex., TR23-006, 2023.
- Short PCPs with polylog query complexity. SIAM Journal on Computing, 38(2):551–607, 2008.
- Decision trees for entity identification: Approximation algorithms and hardness results. In Proceedings of the 26th ACM Symposium on Principles of Database Systems (PODS), pages 53–62, 2007.
- Extracting tree-structured representations of trained networks. Proceedings of the 8th Conference on Advances in Neural Information Processing Systems (NeurIPS), 8:24–30, 1995.
- Irit Dinur. The PCP theorem by gap amplification. J. ACM, 54(3):12–es, jun 2007.
- Learning decision trees from random examples. Information and Computation, 82(3):231–246, 1989.
- Vitaly Feldman. Hardness of approximate two-level logic minimization and pac learning with membership queries. In Proceedings of the 38th Annual ACM Symposium on Theory of Computing (STOC), pages 363–372, 2006.
- Vitaly Feldman. Hardness of proper learning. In Encyclopedia of Algorithms, pages 897–900. 2016.
- Distilling a neural network into a soft decision tree. arXiv preprint arXiv:1711.09784, 2017.
- Property testing and its connection to learning and approximation. Journal of the ACM, 45:653–750, 1998.
- Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, 1979.
- Exact learning when irrelevant variables abound. Information Processing Letters, 70(5):233–239, 1999.
- David Haussler. Quantifying inductive bias: AI learning algorithms and valiant’s learning framework. Artificial Intelligence, 36(2):177–221, 1988.
- Lower bounds on learning decision lists and trees. Information and Computation, 126(2):114–122, 1996.
- Constructing optimal binary decision trees is NP-complete. Information processing letters, 5(1):15–17, 1976.
- Learning decision trees using the Fourier spectrum. SIAM Journal on Computing, 22(6):1331–1348, December 1993.
- On an optimal split tree problem. In Workshop on Algorithms and Data Structures, pages 157–168. Springer, 1999.
- Superpolynomial lower bounds for decision tree learning and testing. In Proceedings of the 2023 ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1962–1994, 2023.
- Leonid A Levin. Universal sorting problem. Problemy Predaci Informacii, 9:265–266, 1973.
- On the hardness of the minimum height decision tree problem. Discrete Applied Mathematics, 144(1-2):209–212, 2004.
- Decision tree approximations of boolean functions. Theoretical Computer Science, 270(1-2):609–623, 2002.
- Computational limitations on learning from examples. Journal of the ACM (JACM), 35(4):965–984, 1988.
- Optimization, approximation, and complexity classes. Journal of Computer and System Sciences, 43(3):425–440, 1991.
- Netanel Raviv. Truth table minimization of computational models. CoRR, abs/1306.3766, 2013.
- Interpretable machine learning: Fundamental principles and 10 grand challenges. Statistics Surveys, 16:1 – 85, 2022.
- The hardness of the expected decision depth problem. Information processing letters, 101(3):112–118, 2007.
- Detlef Sieling. Minimization of decision trees is hard to approximate. Journal of Computer and System Sciences, 74(3):394–403, 2008.
- Learning sparse multivariate polynomials over a field with queries and counterexamples. In Proceedings of the 6th Annual Conference on Computational Learning Theory (COLT), pages 17–26, 1993.
- Luca Trevisan. Inapproximability of Combinatorial Optimization Problems, chapter 13, pages 381–434. John Wiley & Sons, Ltd, 2014.
- Seeing the forest through the trees: Learning a comprehensible model from an ensemble. In European Conference on Machine Learning (ECML), pages 418–429, 2007.
- Leslie Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134–1142, 1984.
- Leslie G Valiant. Learning disjunction of conjunctions. In Proceedings of the 9th International Joint Conference on Artificial Intelligence (IJCAI), pages 560–566, 1985.
- A genetic algorithm for interpretable model extraction from decision tree ensembles. In Trends and Applications in Knowledge Discovery and Data Mining, pages 104–115, 2017.
- Born-again tree ensembles. In Proceedings of the 37th International Conference on Machine Learning (ICML), pages 9743–9753, 2020.
- Finding small equivalent decision trees is hard. International Journal of Foundations of Computer Science, 11(2):343–354, 2000.
- Interpreting models via single tree approximation, 2016.