The Cost of Parallelizing Boosting (2402.15145v1)
Abstract: We study the cost of parallelizing weak-to-strong boosting algorithms for learning, following the recent work of Karbasi and Larsen. Our main results are two-fold: - First, we prove a tight lower bound, showing that even "slight" parallelization of boosting requires an exponential blow-up in the complexity of training. Specifically, let $\gamma$ be the weak learner's advantage over random guessing. The famous \textsc{AdaBoost} algorithm produces an accurate hypothesis by interacting with the weak learner for $\tilde{O}(1 / \gamma2)$ rounds where each round runs in polynomial time. Karbasi and Larsen showed that "significant" parallelization must incur exponential blow-up: Any boosting algorithm either interacts with the weak learner for $\Omega(1 / \gamma)$ rounds or incurs an $\exp(d / \gamma)$ blow-up in the complexity of training, where $d$ is the VC dimension of the hypothesis class. We close the gap by showing that any boosting algorithm either has $\Omega(1 / \gamma2)$ rounds of interaction or incurs a smaller exponential blow-up of $\exp(d)$. -Complementing our lower bound, we show that there exists a boosting algorithm using $\tilde{O}(1/(t \gamma2))$ rounds, and only suffer a blow-up of $\exp(d \cdot t2)$. Plugging in $t = \omega(1)$, this shows that the smaller blow-up in our lower bound is tight. More interestingly, this provides the first trade-off between the parallelism and the total work required for boosting.
- A multiclass boosting framework for achieving fast and provable adversarial robustness. arXiv preprint arXiv:2103.01276, 2021.
- Privacy amplification by subsampling: Tight analyses via couplings and divergences. In NeurIPS, pages 6280–6290, 2018.
- Leo Breiman. Prediction games and arcing algorithms. Neural computation, 11(7):1493–1517, 1999.
- A boosting approach to reinforcement learning. Advances in Neural Information Processing Systems, 35:33806–33817, 2022.
- Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785–794, 2016.
- Xgboost: extreme gradient boosting. R package version 0.4-2, 1(4):1–4, 2015.
- Boosting and differential privacy. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pages 51–60, 2010.
- Yoav Freund. Boosting a weak learning algorithm by majority. Inf. Comput., 121(2):256–285, 1995.
- A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1):119–139, 1997.
- Jerome H Friedman. Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189–1232, 2001.
- Boosting in the presence of noise. In Proceedings of the thirty-fifth annual ACM symposium on Theory of computing, pages 195–205, 2003.
- The impossibility of parallelizing boosting. arXiv preprint arXiv:2301.09627, 2023.
- Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, 30, 2017.
- M Kearns. Thoughts on hypothesis boosting, ml class project. 1988.
- M Kearns and LG Valiant. Crytographic limitations on learning boolean formulae and finite automata. In Proceedings of the twenty-first annual ACM symposium on Theory of computing, pages 433–444, 1989.
- On the boosting ability of top-down decision tree learning algorithms. In Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, pages 459–468, 1996.
- Kasper Green Larsen. Bagging is an optimal PAC learner. In COLT, volume 195 of Proceedings of Machine Learning Research, pages 450–468. PMLR, 2023.
- Adaptive martingale boosting. Advances in Neural Information Processing Systems, 21, 2008.
- Algorithms and hardness results for parallel large margin learning. In J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 24. Curran Associates, Inc., 2011.
- Martingale boosting. In Learning Theory: 18th Annual Conference on Learning Theory, COLT 2005, Bertinoro, Italy, June 27-30, 2005. Proceedings 18, pages 79–94. Springer, 2005.
- Algorithms for parallel boosting. In Fourth International Conference on Machine Learning and Applications (ICMLA’05), pages 6–pp. IEEE, 2005.
- Boosting using branching programs. Journal of Computer and System Sciences, 64(1):103–112, 2002.
- Foundations of machine learning. MIT press, 2018.
- Stephen J Montgomery-Smith. The distribution of rademacher sums. Proceedings of the American Mathematical Society, 109(2):517–522, 1990.
- Smooth sensitivity and sampling in private data analysis. In STOC, pages 75–84. ACM, 2007.
- Scalable and parallel boosting with mapreduce. IEEE Transactions on Knowledge and Data Engineering, 24(10):1904–1916, 2011.
- Robert E Schapire. The strength of weak learnability. Machine learning, 5:197–227, 1990.
- Federated functional gradient boosting. In International Conference on Artificial Intelligence and Statistics, pages 7814–7840. PMLR, 2022.
- Differentially private feature selection via stability arguments, and the robustness of the lasso. In COLT, volume 30 of JMLR Workshop and Conference Proceedings, pages 819–850. JMLR.org, 2013.
- On the uniform convergence of relative frequencies of events to their probabilities. In Measures of complexity: festschrift for alexey chervonenkis, pages 11–30. Springer, 2015.
- C Yu and DB Skillicorn. Parallelizing boosting and bagging. Queen’s University, Kingston, Canada, Tech. Rep, 2001.