Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the Rashomon ratio of infinite hypothesis sets (2404.17746v1)

Published 27 Apr 2024 in cs.LG, math.PR, and stat.ML

Abstract: Given a classification problem and a family of classifiers, the Rashomon ratio measures the proportion of classifiers that yield less than a given loss. Previous work has explored the advantage of a large Rashomon ratio in the case of a finite family of classifiers. Here we consider the more general case of an infinite family. We show that a large Rashomon ratio guarantees that choosing the classifier with the best empirical accuracy among a random subset of the family, which is likely to improve generalizability, will not increase the empirical loss too much. We quantify the Rashomon ratio in two examples involving infinite classifier families in order to illustrate situations in which it is large. In the first example, we estimate the Rashomon ratio of the classification of normally distributed classes using an affine classifier. In the second, we obtain a lower bound for the Rashomon ratio of a classification problem with a modified Gram matrix when the classifier family consists of two-layer ReLU neural networks. In general, we show that the Rashomon ratio can be estimated using a training dataset along with random samples from the classifier family and we provide guarantees that such an estimation is close to the true value of the Rashomon ratio.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (19)
  1. Learning from data, volume 4. AMLBook New York, 2012.
  2. Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks. In International Conference on Machine Learning, pages 322–332. PMLR, 2019.
  3. Face detection in untrained deep neural networks. Nature communications, 12(1):7328, 2021.
  4. Optimality and complexity of classification by random projection, 2021.
  5. Leo Breiman. Statistical modeling: The two cultures. Quality control and applied statistics, 48(1):81–82, 2003.
  6. Random-projection ensemble classification. Journal of the Royal Statistical Society Series B: Statistical Methodology, 79(4):959–1035, 2017.
  7. Evzenie Coupkova. Random parameters in learning: advantages and guarantees. Phd thesis, Purdue University, West Lafayette, IN, May 2024.
  8. Underspecification presents challenges for credibility in modern machine learning. Journal of Machine Learning Research, 23(226):1–61, 2022.
  9. Gradient descent provably optimizes over-parameterized neural networks, 2019.
  10. All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously. Journal of Machine Learning Research, 20(177):1–81, 2019.
  11. Visual number sense in untrained deep neural networks. Science advances, 7(1):eabd6127, 2021.
  12. Learning efficient convolutional networks through network slimming. In Proceedings of the IEEE international conference on computer vision, pages 2736–2744, 2017.
  13. Detecting extrapolation with local ensembles. In International Conference on Learning Representations, 2019.
  14. Early stopping and non-parametric regression: an optimal data-dependent stopping rule. The Journal of Machine Learning Research, 15(1):335–366, 2014.
  15. A path to simpler models starts with noise. Advances in Neural Information Processing Systems, 36, 2024.
  16. On the existence of simpler machine learning models. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, pages 1827–1858, 2022.
  17. Smoothness, low noise and fast rates. Advances in neural information processing systems, 23, 2010.
  18. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1):1929–1958, 2014.
  19. Vladimir N. Vapnik. Statistical Learning Theory. Wiley-Interscience, 1998.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com