Papers
Topics
Authors
Recent
Search
2000 character limit reached

Leveraging PAC-Bayes Theory and Gibbs Distributions for Generalization Bounds with Complexity Measures

Published 19 Feb 2024 in stat.ML and cs.LG | (2402.13285v1)

Abstract: In statistical learning theory, a generalization bound usually involves a complexity measure imposed by the considered theoretical framework. This limits the scope of such bounds, as other forms of capacity measures or regularizations are used in algorithms. In this paper, we leverage the framework of disintegrated PAC-Bayes bounds to derive a general generalization bound instantiable with arbitrary complexity measures. One trick to prove such a result involves considering a commonly used family of distributions: the Gibbs distributions. Our bound stands in probability jointly over the hypothesis and the learning sample, which allows the complexity to be adapted to the generalization gap as it can be customized to fit both the hypothesis class and the task.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. Pierre Alquier. User-friendly Introduction to PAC-Bayes Bounds. Foundations and Trends® in Machine Learning, 2024.
  2. On the properties of variational approximations of Gibbs posteriors. Journal of Machine Learning Research, 2016.
  3. An Exact Characterization of the Generalization Error for the Gibbs Algorithm. In Advances in Neural Information Processing System (NeurIPS), 2021.
  4. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results. Journal of Machine Learning Research, 2002.
  5. Occam’s Hammer. In Annual Conference on Learning Theory (COLT), 2007.
  6. Stability and Generalization. Journal of Machine Learning Research, 2002.
  7. Tightening Mutual Information-Based Bounds on Generalization Error. IEEE Journal on Selected Areas in Information Theory, 2020.
  8. Olivier Catoni. Pac-Bayesian Supervised Classification: The Thermodynamics of Statistical Learning. arXiv, abs/0712.0248, 2007.
  9. Diffusion for global optimization in ℝnsuperscriptℝ𝑛\mathbb{R}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. Siam Journal on Control and Optimization, 1987.
  10. Sharp Minima Can Generalize For Deep Nets. In International Conference on Machine Learning (ICML), 2017.
  11. Generalization in Adaptive Data Analysis and Holdout Reuse. In Advances in Neural Information Processing Systems (NIPS), 2015.
  12. Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data. In Conference on Uncertainty in Artificial Intelligence (UAI), 2017.
  13. Data-dependent PAC-Bayes priors via differential privacy. In Advances in Neural Information Processing System (NeurIPS), 2018.
  14. In search of robust measures of generalization. In Advances in Neural Information Processing System (NeurIPS), 2020.
  15. On the role of data in PAC-Bayes Bounds. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2021.
  16. PAC-Bayesian Learning of Linear Classifiers. In International Conference on Machine Learning (ICML), 2009.
  17. Understanding the difficulty of training deep feedforward neural networks. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2010.
  18. PAC-Bayesian Analysis for a Two-Step Hierarchical Multiview Learning Approach. In Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), 2017.
  19. Benjamin Guedj. A Primer on PAC-Bayesian Learning. arXiv, abs/1901.05353, 2019.
  20. Generalization Bounds via Information Density and Conditional Information Density. IEEE Journal on Selected Areas in Information Theory, 2020.
  21. Generalization Bounds: Perspectives from Information Theory and PAC-Bayes. arXiv, abs/2309.04381, 2023.
  22. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In International Conference on Machine Learning (ICML), 2015.
  23. Predicting the Generalization Gap in Deep Networks with Margin Distributions. In International Conference on Learning Representation (ICLR), 2019a.
  24. Fantastic Generalization Measures and Where to Find Them. In International Conference on Learning Representation (ICLR), 2019b.
  25. Methods and Analysis of The First Competition in Predicting Generalization of Deep Learning. In NeurIPS 2020 Competition and Demonstration Track, 2021.
  26. Adam: A Method for Stochastic Optimization. In International Conference on Learning Representation (ICLR), 2015.
  27. Distribution-Dependent Analysis of Gibbs-ERM Principle. In Annual Conference on Learning Theory (COLT), 2019.
  28. THE MNIST DATASET of handwritten digits, 1998. URL http://yann.lecun.com/exdb/mnist/.
  29. Neural Complexity Measures. In Advances in Neural Information Processing Systems (NeurIPS), 2020.
  30. Dichotomize and Generalize: PAC-Bayesian Binary Activated Deep Neural Networks. In Advances in Neural Information Processing System (NeurIPS), 2019.
  31. Tighter PAC-Bayes bounds through distribution-dependent priors. Theoretical Computer Science, 2013.
  32. Andreas Maurer. A Note on the PAC Bayesian Theorem. arXiv, cs.LG/0411099, 2004.
  33. David McAllester. Some PAC-Bayesian Theorems. In Annual Conference on Computational Learning Theory (COLT), 1998.
  34. Mechanism Design via Differential Privacy. In IEEE Symposium on Foundations of Computer Science (FOCS), 2007.
  35. Ilya Mironov. Rényi Differential Privacy. In IEEE Computer Security Foundations Symposium, 2017.
  36. Foundations of Machine Learning. Adaptive computation and machine learning. MIT Press, 2012.
  37. Uniform convergence may be unable to explain generalization in deep learning. In Advances in Neural Information Processing System (NeurIPS), 2019.
  38. Path-SGD: Path-Normalized Optimization in Deep Neural Networks. In Advances in Neural Information Processing Systems (NeurIPS), 2015.
  39. PAC-Bayes Bounds with Data Dependent Priors. Journal of Machine Learning Research, 2012.
  40. Tighter Risk Certificates for Neural Networks. Journal of Machine Learning Research, 2021.
  41. Information-theoretic analysis of stability and bias of learning algorithms. In IEEE Information Theory Workshop, 2016.
  42. Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis. In Annual Conference on Learning Theory (COLT), 2017.
  43. Learning Gaussian Processes by Minimizing PAC-Bayesian Generalization Bounds. In Advances in Neural Information Processing System (NeurIPS), 2018.
  44. PAC-Bayes Analysis Beyond the Usual Bounds. In Advances in Neural Information Processing System (NeurIPS), 2020.
  45. A PAC Analysis of a Bayesian Estimator. In Annual Conference on Computational Learning Theory (COLT), 1997.
  46. Striving for Simplicity: The All Convolutional Net. In International Conference on Learning Representations (ICLR) – Workshop Track, 2015.
  47. On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities. Theory of Probability and its Applications, 1971.
  48. A general framework for the practical disintegration of PAC-Bayesian bounds. Machine Learning, 2024.
  49. Bayesian Learning via Stochastic Gradient Langevin Dynamics. In International Conference on Machine Learning (ICML), 2011.
  50. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv, abs/1708.07747, 2017.
  51. Information-theoretic analysis of generalization capability of learning algorithms. In Advances in Neural Information Processing System (NeurIPS), 2017.
  52. Robustness and generalization. Machine Learning, 2012.
Citations (1)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 11 likes about this paper.