Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Comparing Comparators in Generalization Bounds (2310.10534v2)

Published 16 Oct 2023 in cs.LG, cs.IT, math.IT, math.ST, stat.ML, and stat.TH

Abstract: We derive generic information-theoretic and PAC-Bayesian generalization bounds involving an arbitrary convex comparator function, which measures the discrepancy between the training and population loss. The bounds hold under the assumption that the cumulant-generating function (CGF) of the comparator is upper-bounded by the corresponding CGF within a family of bounding distributions. We show that the tightest possible bound is obtained with the comparator being the convex conjugate of the CGF of the bounding distribution, also known as the Cram\'er function. This conclusion applies more broadly to generalization bounds with a similar structure. This confirms the near-optimality of known bounds for bounded and sub-Gaussian losses and leads to novel bounds under other bounding distributions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (63)
  1. Alquier, P. (2024). User-friendly introduction to PAC-Bayes bounds. Foundations and Trends® in Machine Learning, 17(2):174–303.
  2. Simpler PAC-Bayesian bounds for hostile data. Machine Learning, 107(5):887–902.
  3. Tighter PAC-Bayes bounds. In Proc. Conf. Neural Inf. Process. Syst. (NeurIPS), Vancouver, Canada.
  4. Audibert, J.-Y. (2004). A better variance control for PAC-Bayesian classification. Technical report. url: api.semanticscholar.org/CorpusID:18053999.
  5. Information complexity and generalization bounds. In Proc. IEEE Int. Symp. Inf. Theory (ISIT), Melbourne, Australia.
  6. PAC-Bayesian bounds based on the Rényi divergence. In Proc. Artif. Intell. Statist. (AISTATS), Cadiz, Spain.
  7. Bernstein, S. (1929). Sur les fonctions absolument monotones. Acta Mathematica, 52:1–66.
  8. Differentiable PAC–Bayes objectives with partially aggregated neural networks. Entropy, 23(10).
  9. Non-vacuous generalisation bounds for shallow neural networks. In Proc. Int. Conf. Mach. Learn. (ICML), Baltimore, MD.
  10. On margins and derandomisation in PAC-Bayes. In Proc. Artif. Intell. Statist. (AISTATS), Virtual Conference.
  11. Concentration inequalities. A nonasymptotic theory of independence. Oxford University Press, Oxford, United Kingdom.
  12. Tightening mutual information-based bounds on generalization error. IEEE J. Sel. Areas Inf. Theory, 1(1):121–130.
  13. PAC-Bayes-Chernoff bounds for unbounded losses. doi: 10.48550/arxiv.2401.01148. arXiv.
  14. Catoni, O. (2007). PAC-Bayesian Supervised Classification: The Thermodynamics of Statistical Learning, volume 56. IMS Lecture Notes Monogr. Ser.
  15. Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing). Wiley-Interscience, USA.
  16. Cramér, H. (1944). On a new limit theorem of the theory of probability. Uspekhi Mathematicheskikh Nauk, (10):166–178.
  17. Information Theory: Coding Theorems for Discrete Memoryless Systems. Cambridge Univ. Press, Cambridge, U.K., 2nd edition.
  18. Asymptotic evaluation of certain Markov process expectations for large time, i. Comm. Pure Appl. Math, 28(1):1–47.
  19. On the role of data in PAC-Bayes bounds. In Proc. Artif. Intell. Statist. (AISTATS), San Diego, CA, USA.
  20. Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data. In Proc. Conf. Uncertainty in Artif. Intell. (UAI), Sydney, Australia.
  21. How tight can PAC-Bayes be in the small data regime? In Proc. Conf. Neural Inf. Process. Syst. (NeurIPS), Virtual Conference.
  22. PAC-Bayesian theory meets Bayesian inference. In Proc. Conf. Neural Inf. Process. Syst. (NeurIPS), Barcelona, Spain.
  23. PAC-Bayesian learning of linear classifiers. In Proc. Int. Conf. Mach. Learning (ICML), Montreal, Canada.
  24. Exponential stochastic inequality. arXiv.
  25. Guedj, B. (2019). A primer on PAC-Bayesian learning. Proc. 2nd Congress Société Mathématique de France, pages 391–414.
  26. PAC-bayes generalisation bounds for heavy-tailed losses through supermartingales. Transactions on Machine Learning Research (TMLR).
  27. PAC-Bayes unleashed: Generalisation bounds with unbounded losses. Entropy, 23(10).
  28. Understanding generalization via leave-one-out conditional mutual information. In Proc. IEEE Int. Symp. Inf. Theory (ISIT), Espoo, Finland.
  29. Sharpened generalization bounds based on conditional mutual information and an application to noisy, iterative algorithms. In Proc. Conf. Neural Inf. Process. Syst. (NeurIPS), Vancouver, Canada.
  30. Information-theoretic generalization bounds for black-box learning algorithms. In Proc. Conf. Neural Inf. Process. Syst. (NeurIPS), Virtual Conference.
  31. Generalization bounds via information density and conditional information density. IEEE J. Sel. Areas Inf. Theory, 1(3):824–839.
  32. A new family of generalization bounds using samplewise evaluated CMI. In Proc. Conf. Neural Inf. Process. Syst. (NeurIPS), New Orleans, LA, USA.
  33. Generalization bounds: Perspectives from information theory and PAC-Bayes. doi: 10.48550/arxiv.2309.04381. arXiv.
  34. Kullback, S. (1954). Certain inequalities in information theory and the Cramér-Rao inequality. The Annals of Mathematical Statistics, 25(4):745 – 751.
  35. (not) bounding the true error. In Proc. Conf. Neural Inf. Process. Syst. (NeurIPS), Vancouver, Canada.
  36. Bounds for averaging classifiers. CMU Technical report, CMU-CS-01-102.
  37. Dichotomize and generalize: PAC-Bayesian binary activated deep neural networks. In Proc. Conf. Neural Inf. Process. Syst. (NeurIPS), Vancouver, Canada.
  38. PAC-Bayes compression bounds so tight that they can explain generalization. In Proc. Conf. Neural Inf. Process. Syst. (NeurIPS), New Orleans, LA, USA.
  39. Online-to-PAC conversions: Generalization bounds via regret analysis. arXiv.
  40. Maurer, A. (2004). A note on the PAC Bayesian theorem. doi: 10.48550/arxiv.cs/0411099. arXiv.
  41. McAllester, D. A. (1998). Some PAC-Bayesian theorems. In Proc. Conf. Learn. Theory (COLT), Madison, WI, USA.
  42. McAllester, D. A. (2003). PAC-Bayesian stochastic model selection. Mach. Learn., 51:5–21.
  43. PAC-Bayesian bound for the conditional value at risk. In Proc. Conf. Neural Inf. Process. Syst. (NeurIPS), volume 33.
  44. Information-theoretic generalization bounds for SGLD via data-dependent estimates. In Proc. Conf. Neural Inf. Process. Syst. (NeurIPS), Vancouver, Canada.
  45. A PAC-Bayesian approach to spectrally-normalized margin bounds for neural networks. In Proc. Int. Conf. Learn. Representations (ICLR), Vancouver, Canada.
  46. Statistical exponential families: a digest with flash cards. doi: 10.48550/arXiv.0911.4863. arXiv.
  47. Tighter risk certificates for neural networks. Journal of Machine Learning Research (JMLR), 22(227):1–40.
  48. PAC-Bayes analysis beyond the usual bounds. In Proc. Conf. Neural Inf. Process. Syst. (NeurIPS), Vancouver, Canada.
  49. Rockafellar, R. T. (1970). Convex analysis. Princeton Mathematical Series. Princeton University Press, Princeton, N. J., USA.
  50. On random subset generalization error bounds and the stochastic gradient langevin dynamics algorithm. In Proc. IEEE Inf. Theory Workshop (ITW), Riva del Garda, Italy.
  51. More PAC-Bayes bounds: From bounded losses, to losses with general tail behaviors, to anytime-validity. In Workshop on PAC-Bayes Meets Interactive Learning (PBMIL), Honolulu, HI, USA.
  52. Controlling bias in adaptive data analysis using information theory. In Proc. Artif. Intell. Statist. (AISTATS), Cadiz, Spain.
  53. PAC-Bayesian inequalities for Martingales. IEEE Trans. Inf. Theory, 58(12):7086–7093.
  54. A PAC analysis of a Bayesian estimator. In Proc. Conf. Learn. Theory (COLT).
  55. Reasoning about generalization via conditional mutual information. In Proc. Conf. Learn. Theory (COLT), Graz, Austria.
  56. Wainwright, M. J. (2019). High-Dimensional Statistics: a Non-Asymptotic Viewpoint. Cambridge Univ. Press, Cambridge, U.K.
  57. Tighter Information-Theoretic Generalization Bounds from Supersamples. In Proc. Int. Conf. Mach. Learning (ICML), Honolulu, HI, USA.
  58. Wasserman, L. (2010). All of statistics : a concise course in statistical inference. Springer, New York.
  59. On the tightness of information-theoretic bounds on generalization error of learning algorithms. arXiv. doi: 10.48550/arxiv.2303.14658.
  60. Information-theoretic analysis of generalization capability of learning algorithms. In Proc. Conf. Neural Inf. Process. Syst. (NeurIPS), Long Beach, CA, USA.
  61. Zhang, T. (2006). Information-theoretic upper and lower bounds for statistical estimation. IEEE Trans. Inf. Theory, 52(4):1307–1321.
  62. Information geometry of the power inverse Gaussian distribution. APPS. Applied Sciences, 9:194–203.
  63. Non-vacuous generalization bounds at the ImageNet scale: a PAC-Bayesian compression approach. In Proc. Int. Conf. Learn. Representations (ICLR), New Orleans, LA.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com