A Competitive Algorithm for Agnostic Active Learning (2310.18786v3)
Abstract: For some hypothesis classes and input distributions, active agnostic learning needs exponentially fewer samples than passive learning; for other classes and distributions, it offers little to no improvement. The most popular algorithms for agnostic active learning express their performance in terms of a parameter called the disagreement coefficient, but it is known that these algorithms are inefficient on some inputs. We take a different approach to agnostic active learning, getting an algorithm that is competitive with the optimal algorithm for any binary hypothesis class $H$ and distribution $D_X$ over $X$. In particular, if any algorithm can use $m*$ queries to get $O(\eta)$ error, then our algorithm uses $O(m* \log |H|)$ queries to get $O(\eta)$ error. Our algorithm lies in the vein of the splitting-based approach of Dasgupta [2004], which gets a similar result for the realizable ($\eta = 0$) setting. We also show that it is NP-hard to do better than our algorithm's $O(\log |H|)$ overhead in general.
- Decision Trees with Hypotheses. Springer International Publishing, 2022. doi: 10.1007/978-3-031-08585-7. URL https://doi.org/10.1007/978-3-031-08585-7.
- Agnostic active learning. In Proceedings of the 23rd international conference on Machine learning, pages 65–72, 2006.
- The bayesian learner is optimal for noisy binary search (and pretty good for quantum as well). In 2008 49th Annual IEEE Symposium on Foundations of Computer Science, pages 221–230, 2008. doi: 10.1109/FOCS.2008.58.
- Importance weighted active learning. In Proceedings of the 26th annual international conference on machine learning, pages 49–56, 2009.
- Agnostic active learning without constraints. Advances in neural information processing systems, 23, 2010.
- An interval estimation problem for controlled observations. Problemy Peredachi Informatsii, 10(3):51–61, 1974.
- Improving generalization with active learning. Machine learning, 15:201–221, 1994.
- Sanjoy Dasgupta. Analysis of a greedy active learning strategy. Advances in neural information processing systems, 17, 2004.
- Sanjoy Dasgupta. Coarse sample complexity bounds for active learning. Advances in neural information processing systems, 18, 2005.
- A general agnostic active learning algorithm. Advances in neural information processing systems, 20, 2007.
- Noisy searching: simple, fast and correct. CoRR, abs/2107.05753, 2021. URL https://arxiv.org/abs/2107.05753.
- Analytical approach to parallel repetition. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing, pages 624–633, 2014.
- Steve Hanneke. A bound on the label complexity of agnostic active learning. In Proceedings of the 24th international conference on Machine learning, pages 353–360, 2007a.
- Steve Hanneke. Teaching dimension and the complexity of active learning. In International conference on computational learning theory, pages 66–81. Springer, 2007b.
- Steve Hanneke. Theory of disagreement-based active learning. Foundations and Trends® in Machine Learning, 7(2-3):131–309, 2014.
- Minimax analysis of active learning. J. Mach. Learn. Res., 16(1):3487–3602, 2015.
- Tibor Hegedűs. Generalized teaching dimensions and the query complexity of learning. In Proceedings of the eighth annual conference on Computational learning theory, pages 108–117, 1995.
- Matti Kääriäinen. Active learning in the non-realizable case. In Algorithmic Learning Theory: 17th International Conference, ALT 2006, Barcelona, Spain, October 7-10, 2006. Proceedings 17, pages 63–77. Springer, 2006.
- Noisy binary search and its applications. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’07, page 881–890, USA, 2007. Society for Industrial and Applied Mathematics. ISBN 9780898716245.
- Improved algorithms for agnostic pool-based active classification. In International Conference on Machine Learning, pages 5334–5344. PMLR, 2021.
- On an optimal split tree problem. In Algorithms and Data Structures: 6th International Workshop, WADS’99 Vancouver, Canada, August 11–14, 1999 Proceedings, pages 157–168. Springer, 2002.
- Constructing optimal binary decision trees is np-complete. Information processing letters, 5(1):15–17, 1976.
- David D Lewis. A sequential algorithm for training text classifiers: Corrigendum and additional data. In Acm Sigir Forum, volume 29, pages 13–19. ACM New York, NY, USA, 1995.
- Heterogeneous uncertainty sampling for supervised learning. In Machine learning proceedings 1994, pages 148–156. Elsevier, 1994.
- Robert Nowak. Generalized binary search. In 2008 46th Annual Allerton Conference on Communication, Control, and Computing, pages 568–574. IEEE, 2008.
- Robert D Nowak. The geometry of generalized binary search. IEEE Transactions on Information Theory, 57(12):7893–7906, 2011.
- Burr Settles. Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin–Madison, 2009.
- Joel Tropp. Freedman’s inequality for matrix martingales. 2011.
- Roman Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018.