Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The Cost of Parallelizing Boosting (2402.15145v1)

Published 23 Feb 2024 in cs.LG and cs.DS

Abstract: We study the cost of parallelizing weak-to-strong boosting algorithms for learning, following the recent work of Karbasi and Larsen. Our main results are two-fold: - First, we prove a tight lower bound, showing that even "slight" parallelization of boosting requires an exponential blow-up in the complexity of training. Specifically, let $\gamma$ be the weak learner's advantage over random guessing. The famous \textsc{AdaBoost} algorithm produces an accurate hypothesis by interacting with the weak learner for $\tilde{O}(1 / \gamma2)$ rounds where each round runs in polynomial time. Karbasi and Larsen showed that "significant" parallelization must incur exponential blow-up: Any boosting algorithm either interacts with the weak learner for $\Omega(1 / \gamma)$ rounds or incurs an $\exp(d / \gamma)$ blow-up in the complexity of training, where $d$ is the VC dimension of the hypothesis class. We close the gap by showing that any boosting algorithm either has $\Omega(1 / \gamma2)$ rounds of interaction or incurs a smaller exponential blow-up of $\exp(d)$. -Complementing our lower bound, we show that there exists a boosting algorithm using $\tilde{O}(1/(t \gamma2))$ rounds, and only suffer a blow-up of $\exp(d \cdot t2)$. Plugging in $t = \omega(1)$, this shows that the smaller blow-up in our lower bound is tight. More interestingly, this provides the first trade-off between the parallelism and the total work required for boosting.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. A multiclass boosting framework for achieving fast and provable adversarial robustness. arXiv preprint arXiv:2103.01276, 2021.
  2. Privacy amplification by subsampling: Tight analyses via couplings and divergences. In NeurIPS, pages 6280–6290, 2018.
  3. Leo Breiman. Prediction games and arcing algorithms. Neural computation, 11(7):1493–1517, 1999.
  4. A boosting approach to reinforcement learning. Advances in Neural Information Processing Systems, 35:33806–33817, 2022.
  5. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785–794, 2016.
  6. Xgboost: extreme gradient boosting. R package version 0.4-2, 1(4):1–4, 2015.
  7. Boosting and differential privacy. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pages 51–60, 2010.
  8. Yoav Freund. Boosting a weak learning algorithm by majority. Inf. Comput., 121(2):256–285, 1995.
  9. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1):119–139, 1997.
  10. Jerome H Friedman. Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189–1232, 2001.
  11. Boosting in the presence of noise. In Proceedings of the thirty-fifth annual ACM symposium on Theory of computing, pages 195–205, 2003.
  12. The impossibility of parallelizing boosting. arXiv preprint arXiv:2301.09627, 2023.
  13. Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, 30, 2017.
  14. M Kearns. Thoughts on hypothesis boosting, ml class project. 1988.
  15. M Kearns and LG Valiant. Crytographic limitations on learning boolean formulae and finite automata. In Proceedings of the twenty-first annual ACM symposium on Theory of computing, pages 433–444, 1989.
  16. On the boosting ability of top-down decision tree learning algorithms. In Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, pages 459–468, 1996.
  17. Kasper Green Larsen. Bagging is an optimal PAC learner. In COLT, volume 195 of Proceedings of Machine Learning Research, pages 450–468. PMLR, 2023.
  18. Adaptive martingale boosting. Advances in Neural Information Processing Systems, 21, 2008.
  19. Algorithms and hardness results for parallel large margin learning. In J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 24. Curran Associates, Inc., 2011.
  20. Martingale boosting. In Learning Theory: 18th Annual Conference on Learning Theory, COLT 2005, Bertinoro, Italy, June 27-30, 2005. Proceedings 18, pages 79–94. Springer, 2005.
  21. Algorithms for parallel boosting. In Fourth International Conference on Machine Learning and Applications (ICMLA’05), pages 6–pp. IEEE, 2005.
  22. Boosting using branching programs. Journal of Computer and System Sciences, 64(1):103–112, 2002.
  23. Foundations of machine learning. MIT press, 2018.
  24. Stephen J Montgomery-Smith. The distribution of rademacher sums. Proceedings of the American Mathematical Society, 109(2):517–522, 1990.
  25. Smooth sensitivity and sampling in private data analysis. In STOC, pages 75–84. ACM, 2007.
  26. Scalable and parallel boosting with mapreduce. IEEE Transactions on Knowledge and Data Engineering, 24(10):1904–1916, 2011.
  27. Robert E Schapire. The strength of weak learnability. Machine learning, 5:197–227, 1990.
  28. Federated functional gradient boosting. In International Conference on Artificial Intelligence and Statistics, pages 7814–7840. PMLR, 2022.
  29. Differentially private feature selection via stability arguments, and the robustness of the lasso. In COLT, volume 30 of JMLR Workshop and Conference Proceedings, pages 819–850. JMLR.org, 2013.
  30. On the uniform convergence of relative frequencies of events to their probabilities. In Measures of complexity: festschrift for alexey chervonenkis, pages 11–30. Springer, 2015.
  31. C Yu and DB Skillicorn. Parallelizing boosting and bagging. Queen’s University, Kingston, Canada, Tech. Rep, 2001.
Citations (1)

Summary

We haven't generated a summary for this paper yet.