Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PAC-Bayes-Chernoff bounds for unbounded losses (2401.01148v4)

Published 2 Jan 2024 in stat.ML and cs.LG

Abstract: We introduce a new PAC-Bayes oracle bound for unbounded losses that extends Cram\'er-Chernoff bounds to the PAC-Bayesian setting. The proof technique relies on controlling the tails of certain random variables involving the Cram\'er transform of the loss. Our approach naturally leverages properties of Cram\'er-Chernoff bounds, such as exact optimization of the free parameter in many PAC-Bayes bounds. We highlight several applications of the main theorem. Firstly, we show that our bound recovers and generalizes previous results. Additionally, our approach allows working with richer assumptions that result in more informative and potentially tighter bounds. In this direction, we provide a general bound under a new \textit{model-dependent} assumption from which we obtain bounds based on parameter norms and log-Sobolev inequalities. Notably, many of these bounds can be minimized to obtain distributions beyond the Gibbs posterior and provide novel theoretical coverage to existing regularization techniques.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Alquier, P. User-friendly introduction to PAC-Bayes bounds. arXiv preprint arXiv:2110.11216, 2021.
  2. An oracle inequality for quasi-Bayesian nonnegative matrix factorization. Mathematical Methods of Statistics, 26:55–67, 2017.
  3. Simpler PAC-Bayesian bounds for hostile data. Machine Learning, 107(5):887–902, 2018.
  4. PAC-Bayesian bounds for sparse regression estimation with exponential weights. Electronic Journal of Statistics, 5:127 – 145, 2011.
  5. On the properties of variational approximations of Gibbs posteriors. The Journal of Machine Learning Research, 17(1):8374–8414, 2016.
  6. On strict sub-Gaussianity, optimal proxy variance and symmetry for bounded random variables. ESAIM: Probability and Statistics, 24:39–55, 2020.
  7. Information complexity and generalization bounds. In 2021 IEEE International Symposium on Information Theory (ISIT), pp.  676–681. IEEE, 2021.
  8. PAC-Bayesian bounds based on the Rényi divergence. In Artificial Intelligence and Statistics, pp.  435–444. PMLR, 2016.
  9. Bishop, C. M. Pattern Recognition and Machine Learning. Springer New York, NY, 2006.
  10. Concentration inequalities: A nonasymptotic theory of independence. Oxford University Press, 2013.
  11. Catoni, O. PAC-Bayesian Supervised Classification: The Thermodynamics of Statistical Learning, volume 1277 of IMS Lecture Notes-Monograph Series. 2007.
  12. Large deviations techniques and applications, volume 38 of Stochastic Modelling and Applied Probability. Springer Science & Business Media, 2009.
  13. Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data. arXiv preprint arXiv:1703.11008, 2017.
  14. A note on generalized inverses. Mathematical Methods of Operations Research, 77:423–432, 2013.
  15. On the importance of gradient norm in PAC-Bayesian bounds. Advances in Neural Information Processing Systems, 35:16068–16081, 2022.
  16. PAC-Bayesian theory meets Bayesian inference. Advances in Neural Information Processing Systems, 29, 2016.
  17. Guedj, B. A primer on PAC-Bayesian learning. arXiv preprint arXiv:1901.05353, 2019.
  18. Still no free lunches: the price to pay for tighter PAC-Bayes bounds. Entropy, 23(11):1529, 2021.
  19. PAC-Bayes generalisation bounds for heavy-tailed losses through supermartingales. Transactions on Machine Learning Research, 2023.
  20. Upper and lower bounds on the performance of kernel PCA. arXiv preprint arXiv:2012.10369, 2020.
  21. PAC-Bayes unleashed: Generalisation bounds with unbounded losses. Entropy, 23(10):1330, 2021.
  22. Generalization bounds via information density and conditional information density. IEEE Journal on Selected Areas in Information Theory, 1(3):824–839, 2020.
  23. Corrections to “Generalization bounds via information density and conditional information density”. IEEE Journal on Selected Areas in Information Theory, 2(3):1072–1073, 2021.
  24. Generalization Bounds: Perspectives from Information Theory and PAC-Bayes. arXiv preprint arXiv:2309.04381, 2023.
  25. Holland, M. PAC-Bayes under potentially heavy tails. Advances in Neural Information Processing Systems, 32, 2019.
  26. Bounds for averaging classifiers. Technical report, School of Computer Science, Carnegie Mellon University, 2001.
  27. Masegosa, A. Learning under model misspecification: Applications to variational and ensemble methods. Advances in Neural Information Processing Systems, 33:5479–5491, 2020.
  28. Second order PAC-Bayesian bounds for the weighted majority vote. Advances in Neural Information Processing Systems, 33:5263–5273, 2020.
  29. Understanding generalization in the interpolation regime using the rate function. arXiv preprint arXiv:2306.10947, 2023.
  30. Maurer, A. A note on the PAC Bayesian theorem. arXiv preprint cs/0411099, 2004.
  31. McAllester, D. A. Some PAC-Bayesian theorems. In Proceedings of the 11th annual conference on Computational Learning Theory, pp.  230–234. ACM, 1998.
  32. McAllester, D. A. PAC-Bayesian model averaging. In Proceedings of the 12th annual conference on Computational Learning theory, pp.  164–170. ACM, 1999.
  33. McAllester, D. A. PAC-Bayesian stochastic model selection. Machine Learning, 51:5–21, 2003.
  34. Diversity and generalization in neural network ensembles. In International Conference on Artificial Intelligence and Statistics, pp.  11720–11743. PMLR, 2022.
  35. PAC-Bayes with backprop. arXiv preprint arXiv:1908.07380, 2019.
  36. PAC-Bayes analysis beyond the usual bounds. Advances in Neural Information Processing Systems, 33:16833–16845, 2020.
  37. More PAC-Bayes bounds: From bounded losses, to losses with general tail behaviors, to anytime-validity. arXiv preprint arXiv:2306.12214, 2023.
  38. Seeger, M. PAC-Bayesian generalisation error bounds for Gaussian process classification. Journal of machine learning research, 3(Oct):233–269, 2002.
  39. PAC-Bayes-Bernstein inequality for martingales and its application to multiarmed bandits. In Proceedings of the Workshop on On-line Trading of Exploration and Exploitation 2, pp.  98–111. JMLR Workshop and Conference Proceedings, 2012.
  40. A PAC analysis of a Bayesian estimator. In Proceedings of the 10th annual conference on Computational Learning theory, pp.  2–9. ACM, 1997.
  41. A strongly quasiconvex PAC-Bayesian bound. In International Conference on Algorithmic Learning Theory, pp.  466–492. PMLR, 2017.
  42. PAC-Bayes-empirical-Bernstein inequality. Advances in Neural Information Processing Systems, 26, 2013.
  43. Chebyshev-Cantelli PAC-Bayes-Bennett inequality for the weighted majority vote. Advances in Neural Information Processing Systems, 34:12625–12636, 2021.
  44. Information-theoretic analysis of generalization capability of learning algorithms. Advances in Neural Information Processing Systems, 30, 2017.
Citations (5)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets