Optimal Decision Tree and Adaptive Submodular Ranking with Noisy Outcomes (2312.15357v2)
Abstract: In pool-based active learning, the learner is given an unlabeled data set and aims to efficiently learn the unknown hypothesis by querying the labels of the data points. This can be formulated as the classical Optimal Decision Tree (ODT) problem: Given a set of tests, a set of hypotheses, and an outcome for each pair of test and hypothesis, our objective is to find a low-cost testing procedure (i.e., decision tree) that identifies the true hypothesis. This optimization problem has been extensively studied under the assumption that each test generates a deterministic outcome. However, in numerous applications, for example, clinical trials, the outcomes may be uncertain, which renders the ideas from the deterministic setting invalid. In this work, we study a fundamental variant of the ODT problem in which some test outcomes are noisy, even in the more general case where the noise is persistent, i.e., repeating a test gives the same noisy output. Our approximation algorithms provide guarantees that are nearly best possible and hold for the general case of a large number of noisy outcomes per test or per hypothesis where the performance degrades continuously with this number. We numerically evaluated our algorithms for identifying toxic chemicals and learning linear classifiers, and observed that our algorithms have costs very close to the information-theoretic minimum.
- Approximating optimal binary decision trees. In Approximation, Randomization and Combinatorial Optimization. Algorithms and Techniques, pages 1–9. Springer, 2008.
- Decision trees for geometric models. International Journal of Computational Geometry & Applications, 8(03):343–363, 1998.
- Ranking with submodular valuations. In Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms, pages 1070–1079. SIAM, 2011.
- Multiple intents re-ranking. In Proceedings of the forty-first annual ACM symposium on Theory of computing, pages 669–678. ACM, 2009.
- Agnostic active learning. In Machine Learning, Proceedings of the Twenty-Third International Conference (ICML 2006), Pittsburgh, Pennsylvania, USA, June 25-29, 2006, pages 65–72, 2006.
- Active diagnosis under persistent noise with unknown noise distribution: A rank-based approach. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2011, Fort Lauderdale, USA, April 11-13, 2011, pages 155–163, 2011.
- Approximating decision trees with multiway branches. In International Colloquium on Automata, Languages, and Programming, pages 210–221. Springer, 2009.
- Decision trees for entity identification: Approximation algorithms and hardness results. ACM Trans. Algorithms, 7(2):15:1–15:22, 2011.
- Near-optimal bayesian active learning with correlated and noisy tests. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, AISTATS 2017, 20-22 April 2017, Fort Lauderdale, FL, USA, pages 223–231, 2017.
- Diagnosis determination: decision trees optimizing simultaneously worst and expected testing cost. In Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21-26 June 2014, pages 414–422, 2014.
- Sanjoy Dasgupta. Analysis of a greedy active learning strategy. In Advances in neural information processing systems, pages 337–344, 2005.
- Approximation algorithms for the test cover problem. Mathematical Programming, 98(1-3):477–491, 2003.
- Uriel Feige. A threshold of ln n for approximating set cover. Journal of the ACM (JACM), 45(4):634–652, 1998.
- Performance bounds on the splitting algorithm for binary testing. Acta Informatica, 3:347–355, 1974.
- Adaptive submodularity: Theory and applications in active learning and stochastic optimization. J. Artif. Intell. Res., 42:427–486, 2011. doi: 10.1613/jair.3278. URL https://doi.org/10.1613/jair.3278.
- Adaptive submodularity: A new approach to active learning and stochastic optimization. CoRR, abs/1003.3967, 2017. URL http://arxiv.org/abs/1003.3967.
- Near-optimal bayesian active learning with noisy observations. In Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, Vancouver, British Columbia, Canada., pages 766–774, 2010.
- Average-case active learning with costs. In Algorithmic Learning Theory, 20th International Conference, ALT 2009, Porto, Portugal, October 3-5, 2009. Proceedings, pages 141–155, 2009.
- Interactive submodular set cover. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), June 21-24, 2010, Haifa, Israel, pages 415–422, 2010.
- Simultaneous learning and covering with adversarial noise. In Proceedings of the 28th International Conference on Machine Learning, ICML 2011, Bellevue, Washington, USA, June 28 - July 2, 2011, pages 369–376, 2011.
- Approximation algorithms for optimal decision trees and adaptive tsp problems. Mathematics of Operations Research, 42(3):876–896, 2017.
- Steve Hanneke. A bound on the label complexity of agnostic active learning. In Machine Learning, Proceedings of the Twenty-Fourth International Conference (ICML 2007), Corvallis, Oregon, USA, June 20-24, 2007, pages 353–360, 2007.
- Constructing optimal binary decision trees is NP𝑁𝑃NPitalic_N italic_P-complete. Information Processing Lett., 5(1):15–17, 1976/77.
- Minimum latency submodular cover. ACM Transactions on Algorithms (TALG), 13(1):13, 2016.
- Near optimal bayesian active learning for decision making. In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, AISTATS 2014, Reykjavik, Iceland, April 22-25, 2014, pages 430–438, 2014.
- Optimal decision tree with noisy outcomes. In Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett, editors, Annual Conference on Neural Information Processing Systems (NeurIPS), pages 3298–3308, 2019.
- On an optimal split tree problem. In Workshop on Algorithms and Data Structures, pages 157–168. Springer, 1999.
- A tight analysis of greedy yields subexponential time approximation for uniform decision tree. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 102–121. SIAM, 2020.
- Near-optimal algorithms for shared filter evaluation in data stream systems. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, Vancouver, BC, Canada, June 10-12, 2008, pages 133–146, 2008.
- D. W. Loveland. Performance bounds for binary testing with arbitrary weights. Acta Inform., 22(1):101–114, 1985.
- Mikhail Ju. Moshkov. Greedy algorithm with weights for decision tree construction. Fundam. Inform., 104(3):285–292, 2010.
- Noisy bayesian active learning. In 50th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2012, Allerton Park & Retreat Center, Monticello, IL, USA, October 1-5, 2012, pages 1626–1633, 2012.
- Comments on the proof of adaptive stochastic set cover based on adaptive submodularity and its implications for the group identification problem in ”group-based active query selection for rapid diagnosis in time-critical situations”. IEEE Trans. Information Theory, 63(11):7612–7614, 2017.
- Adaptive submodular ranking and routing. Oper. Res., 68(3):856–877, 2020.
- Robert D. Nowak. Noisy generalized binary search. In Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, Vancouver, British Columbia, Canada., pages 1366–1374, 2009.
- Trading off worst and expected cost in decision tree problems. Algorithmica, 79(3):886–908, 2017.
- Laurence A Wolsey. An analysis of the greedy algorithm for the submodular set covering problem. Combinatorica, 2(4):385–393, 1982.