Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Lassoed Tree Boosting (2205.10697v6)

Published 22 May 2022 in stat.ML, cs.LG, math.ST, and stat.TH

Abstract: Gradient boosting performs exceptionally in most prediction problems and scales well to large datasets. In this paper we prove that a ``lassoed'' gradient boosted tree algorithm with early stopping achieves faster than $n{-1/4}$ L2 convergence in the large nonparametric space of cadlag functions of bounded sectional variation. This rate is remarkable because it does not depend on the dimension, sparsity, or smoothness. We use simulation and real data to confirm our theory and demonstrate empirical performance and scalability on par with standard boosting. Our convergence proofs are based on a novel, general theorem on early stopping with empirical loss minimizers of nested Donsker classes.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. Implicit gradient regularization. arXiv preprint arXiv:2009.11162, 2020.
  2. David Benkeser and Mark van der Laan. The highly adaptive lasso estimator. Proc Int Conf Data Sci Adv Anal, 2016:689–696, December 2016.
  3. Aurélien F Bibaut and Mark J van der Laan. Fast rates for empirical risk minimization over càdlàg functions with bounded sectional variation norm. July 2019.
  4. Some theory for generalized boosting algorithms. Journal of Machine Learning Research, 7(5), 2006.
  5. On the rate of convergence of regularized boosting classifiers. Journal of Machine Learning Research, 4(Oct):861–894, 2003.
  6. Kaggle forecasting competitions: An overlooked learning opportunity. Int. J. Forecast., 37(2):587–603, April 2021.
  7. Boosting with the l 2 loss: regression and classification. Journal of the American Statistical Association, 98(462):324–339, 2003.
  8. Peter Lukas Bühlmann. Consistency for l2 boosting and matching pursuit with trees and tree-type basis functions. In Research report/Seminar für Statistik, Eidgenössische Technische Hochschule (ETH), volume 109. Seminar für Statistik, Eidgenössische Technische Hochschule (ETH), 2002.
  9. XGBoost. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, August 2016. ACM.
  10. UCI machine learning repository, 2017.
  11. Sandrine Dudoit and Mark J van der Laan. Asymptotics of cross-validated risk estimation in estimator selection and performance assessment. Stat. Methodol., 2(2):131 154, 2005.
  12. Least angle regression. aos, 32(2):407–499, April 2004.
  13. Multivariate extensions of isotonic regression and total variation denoising via entire monotonicity and Hardy-Krause variation. March 2019.
  14. J H Friedman. Greedy function approximation: a gradient boosting machine. Ann. Stat., 2001.
  15. Implicit regularization of discrete gradient dynamics in linear neural networks. Advances in Neural Information Processing Systems, 32, 2019.
  16. 2. Overview of Supervised Learning. In The Elements of Statistical Learning, page 1 34. Springer New York, New York, NY, January 2009.
  17. hal9001: Scalable highly adaptive lasso regression inr. Journal of Open Source Software, 5(53):2526, 2020.
  18. Ye Luo and Martin Spindler. High-Dimensional L2subscript𝐿2L_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTBoosting: Rate of convergence. arXiv, 2016.
  19. High-dimensional l⁢_⁢2𝑙_2l\_2italic_l _ 2 boosting: Rate of convergence. arXiv preprint arXiv:1602.08927, 2016.
  20. The consistency of greedy algorithms for classification. In International Conference on Computational Learning Theory, pages 319–333. Springer, 2002.
  21. Greedy algorithms for classification–consistency, convergence rates, and adaptivity. Journal of Machine Learning Research, 4(Oct):713–742, 2003.
  22. Georg Neuhaus. On weak convergence of stochastic processes with multidimensional time parameter. The Annals of Mathematical Statistics, 42(4):1285–1295, 1971.
  23. Higher order influence functions and minimax estimation of nonlinear functionals. In Probability and Statistics: Essays in Honor of David A. Freedman, Probability and Statistics: Essays in Honor of David A. Freedman, pages 335–421. Institute of Mathematical Statistics, Beachwood, Ohio, USA, 2008.
  24. Charles J Stone. Optimal Global Rates of Convergence for Nonparametric Regression. Ann. Stat., 10(4):1040 1053, December 1982.
  25. A Tsiatis. Semiparametric theory and missing data, 2007.
  26. Mark van der Laan. A generally efficient targeted minimum loss based estimator based on the highly adaptive lasso. Int. J. Biostat., 13(2), October 2017.
  27. Mark van der Laan. Higher order spline highly adaptive lasso estimators of functional parameters: Pointwise asymptotic normality and uniform convergence rates, 2023.
  28. Efficient estimation of pathwise differentiable target parameters with the undersmoothed highly adaptive lasso. August 2019.
  29. Unified Methods for Censored Longitudinal Data and Causality. Springer Science & Business Media, January 2003.
  30. A W van der Vaart. Asymptotic Statistics. Cambridge University Press, June 2000.
  31. Boosting with early stopping: Convergence and consistency. 2005.
  32. Boosting with early stopping: Convergence and consistency. August 2005.
  33. Boosted lasso. https://statistics.berkeley.edu/sites/default/files/tech-reports/678.pdf. Accessed: 2022-1-26.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com