Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A note on generalization bounds for losses with finite moments (2403.16681v1)

Published 25 Mar 2024 in stat.ML and cs.LG

Abstract: This paper studies the truncation method from Alquier [1] to derive high-probability PAC-Bayes bounds for unbounded losses with heavy tails. Assuming that the $p$-th moment is bounded, the resulting bounds interpolate between a slow rate $1 / \sqrt{n}$ when $p=2$, and a fast rate $1 / n$ when $p \to \infty$ and the loss is essentially bounded. Moreover, the paper derives a high-probability PAC-Bayes bound for losses with a bounded variance. This bound has an exponentially better dependence on the confidence parameter and the dependency measure than previous bounds in the literature. Finally, the paper extends all results to guarantees in expectation and single-draw PAC-Bayes. In order to so, it obtains analogues of the PAC-Bayes fast rate bound for bounded losses from [2] in these settings.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. P. Alquier, “Transductive and inductive adaptative inference for regression and density estimation,” University Paris 6, 2006.
  2. B. Rodríguez-Gálvez, R. Thobaben, and M. Skoglund, “More PAC-Bayes bounds: From bounded losses, to losses with general tail behaviors, to anytime-validity,” 2023. [Online]. Available: https://arxiv.org/abs/2306.12214
  3. J. Shawe-Taylor and R. C. Williamson, “A PAC analysis of a Bayesian estimator,” in Proceedings of the tenth annual conference on Computational Learning Theory, 1997, pp. 2–9.
  4. D. A. McAllester, “Some PAC-Bayesian theorems,” in Proceedings of the eleventh annual conference on Computational Learning Theory, 1998, pp. 230–234.
  5. ——, “PAC-Bayesian model averaging,” in Proceedings of the twelfth annual conference on Computational Learning Theory, 1999, pp. 164–170.
  6. ——, “PAC-Bayesian stochastic model selection,” Machine Learning, vol. 51, no. 1, pp. 5–21, 2003.
  7. A. Maurer, “A note on the PAC Bayesian theorem,” arXiv preprint cs/0411099, 2004.
  8. P. Germain, A. Lacasse, F. Laviolette, M. March, and J.-F. Roy, “Risk Bounds for the Majority Vote: From a PAC-Bayesian Analysis to a Learning Algorithm,” Journal of Machine Learning Research, vol. 16, no. 26, pp. 787–860, 2015.
  9. T. Zhang, “Information-theoretic upper and lower bounds for statistical estimation,” IEEE Transactions on Information Theory, vol. 52, no. 4, pp. 1307–1321, 2006.
  10. P. K. Banerjee and G. Montúfar, “Information complexity and generalization bounds,” in 2021 IEEE International Symposium on Information Theory (ISIT).   IEEE, 2021, pp. 676–681.
  11. F. Hellström and G. Durisi, “Generalization bounds via information density and conditional information density,” IEEE Journal on Selected Areas in Information Theory, vol. 1, no. 3, pp. 824–839, 2020.
  12. B. Guedj and L. Pujol, “Still no free lunches: the price to pay for tighter PAC-Bayes bounds,” Entropy, vol. 23, no. 11, p. 1529, 2021.
  13. S. Asmussen, J. L. Jensen, and L. Rojas-Nandayapa, “On the Laplace transform of the lognormal distribution,” Methodology and Computing in Applied Probability, vol. 18, pp. 441–458, 2016.
  14. Z. Wang, L. Shen, Y. Miao, S. Chen, and W. Xu, “PAC-Bayesian inequalities of some random variables sequences,” Journal of Inequalities and Applications, vol. 2015, no. 1, pp. 1–8, 2015.
  15. M. Haddouche and B. Guedj, “PAC-Bayes generalisation bounds for heavy-tailed losses through supermartingales,” Transactions on Machine Learning Research, 2023. [Online]. Available: https://openreview.net/forum?id=qxrwt6F3sf
  16. B. Chugg, H. Wang, and A. Ramdas, “A unified recipe for deriving (time-uniform) PAC-Bayes bounds,” arXiv preprint arXiv:2302.03421, 2023.
  17. M. Holland, “PAC-Bayes under potentially heavy tails,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  18. P. Alquier and B. Guedj, “Simpler PAC-Bayesian bounds for hostile data,” Machine Learning, vol. 107, no. 5, pp. 887–902, 2018.
  19. M. Haddouche, B. Guedj, O. Rivasplata, and J. Shawe-Taylor, “PAC-Bayes unleashed: Generalisation bounds with unbounded losses,” Entropy, vol. 23, no. 10, p. 1330, 2021.
  20. P. Alquier, “User-friendly introduction to PAC-Bayes bounds,” arXiv preprint arXiv:2110.11216, 2021.
  21. Y. Ohnishi and J. Honorio, “Novel change of measure inequalities with applications to PAC-Bayesian bounds and Monte Carlo estimation,” in International Conference on Artificial Intelligence and Statistics.   PMLR, 2021, pp. 1711–1719.
  22. J. Langford and M. Seeger, “Bounds for averaging classifiers,” School of Computer Science, Carnegie Mellon University, Tech. Rep., 2001.
  23. M. Seeger, “PAC-Bayesian generalisation error bounds for Gaussian process classification,” Journal of Machine Learning Research, vol. 3, no. Oct, pp. 233–269, 2002.
  24. O. Catoni, “PAC-Bayesian Supervised Classification: The Thermodynamics of Statistical Learning,” IMS Lecture Notes Monograph Series, vol. 56, p. 163pp, 2007.
  25. B. Rodríguez-Gálvez, G. Bassi, R. Thobaben, and M. Skoglund, “Tighter expected generalization error bounds via Wasserstein distance,” Advances in Neural Information Processing Systems, vol. 34, pp. 19 109–19 121, 2021.
  26. M. D. Donsker and S. S. Varadhan, “Asymptotic evaluation of certain Markov process expectations for large time, i,” Communications on Pure and Applied Mathematics, vol. 28, no. 1, pp. 1–47, 1975.
  27. P. Germain, A. Lacasse, F. Laviolette, and M. Marchand, “PAC-Bayesian learning of linear classifiers,” in Proceedings of the 26th Annual International Conference on Machine Learning, 2009, pp. 353–360.
  28. O. Rivasplata, I. Kuzborskij, C. Szepesvári, and J. Shawe-Taylor, “PAC-Bayes analysis beyond the usual bounds,” Advances in Neural Information Processing Systems, vol. 33, pp. 16 833–16 845, 2020.
  29. L. Bégin, P. Germain, F. Laviolette, and J.-F. Roy, “PAC-Bayesian theory for transductive learning,” in Artificial Intelligence and Statistics.   PMLR, 2014, pp. 105–113.

Summary

We haven't generated a summary for this paper yet.