2000 character limit reached
Statistical learning theory and Occam's razor: The core argument (2312.13842v2)
Published 21 Dec 2023 in cs.LG, math.ST, and stat.TH
Abstract: Statistical learning theory is often associated with the principle of Occam's razor, which recommends a simplicity preference in inductive inference. This paper distills the core argument for simplicity obtainable from statistical learning theory, built on the theory's central learning guarantee for the method of empirical risk minimization. This core "means-ends" argument is that a simpler hypothesis class or inductive model is better because it has better learning guarantees; however, these guarantees are model-relative and so the theoretical push towards simplicity is checked by our prior knowledge.
- On learnability wih computable learners. In A. Kontorovich and G. Neu, editors, Proceedings of the 31st International Conference on Algorithmic Learning Theory (ALT 2020), volume 117 of Proceedings of Machine Learning Research, pages 48–60, San Diego, CA, 2020.
- E. Alpaydin. Introduction to Machine Learning. Adaptive Computing and Machine Learning. MIT Press, fourth edition, 2020.
- D. Angluin and P. Laird. Learning from noisy examples. Machine Learning, 2:343–370, 1988.
- M. Anthony and N. Biggs. Computational Learning Theory, volume 30 of Cambridge Tracts in Theoretical Computer Science. Cambridge University Press, 1992.
- A. Baker. Simplicity. In E. N. Zalta, editor, The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University, Summer 2022 edition, 2022.
- P. S. Bandyopadhyay and M. R. Forster, editors. Philosophy of Statistics, volume 7 of Handbook of the Philosophy of Science. Elsevier, 2011.
- Simple models in complex worlds: Occam’s razor and statistical learning theory. Minds and Machines, 32(1):13–42, 2022.
- Deep learning: A statistical viewpoint. Acta Numerica, 30:87–201, 2021.
- C. Beisbart and T. Räz. Philosophy of science at sea: Clarifying the interpretability of machine learning. Philosophy Compass, 17(6), 2022.
- M. Belkin. Fit without fear: Remarkable mathematical phenomena of deep learning through the prism of interpolation. Acta Numerica, 30:203–248, 2021.
- The modern mathematics of deep learning. In P. Grohs and G. Kutyniok, editors, Mathematical Aspects of Deep Learning, pages 1–111. Cambridge University Press, 2022.
- Classifying learnable geometric concepts with the Vapnik-Chervonenkis dimension. In J. Hartmanis, editor, Proceedings of the 18th Annual ACM Symposium on Theory of Computing (STOC ’86), pages 273–282, 1986.
- Occam’s razor. Information Processing Letters, 24:377–380, 1987.
- Learnability and the Vapnik-Chervonenkis dimension. Journal of the Association for Computing Machinery, 36(4):929–965, 1989.
- T. Bonk. Functionspaces, simplicity and curve fitting. Synthese, 201:58, 2023.
- A theory of universal learning. In S. Khuller and V. V. Williams, editors, Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing (STOC ’21), pages 532–541. ACM, 2021.
- L. Breiman. Statistical modeling: The two cultures (with comments and a rejoinder by the author). Statistical Science, 16(3):199–231, 2001.
- V. Cherkassky and F. Mulier. Statistical Learning From Data: Concepts, Theory, and Methods. Wiley, 2nd edition, 2007.
- D. Cohn and G. Tesauro. How tight are the Vapnik-Chervonenkis bounds? Neural Computation, 4(2):249–269, 1992.
- Falsificationism and statistical learning theory: Comparing the Popper and Vapnik-Chervonenkis dimensions. Journal for General Philosophy of Science, 40(1):51–58, 2009.
- T. M. Cover. Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Transactions on Electronic Computers, EC-14(3):326–334, 1965.
- A Probabilistic Theory of Pattern Recognition, volume 31 of Applications of Mathematics: Stochastic Modelling and Applied Probability. Springer, 1996.
- P. Domingos. Occam’s two razors: The sharp and the blunt. In R. Agrawal, P. E. Stolorz, and G. Piatetsky-Shapiro, editors, Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD’98), pages 37–43. AAAI Press, 1998.
- P. Domingos. The role of Occam’s razor in knowledge discovery. Data Mining and Knowledge Discovery, 3(4):409–425, 1999.
- P. Domingos. A few useful things to know about machine learning. Communications of the ACM, 55(10):78–87, 2012.
- Pattern Classification. Wiley, second edition, 2001.
- Fast rates in statistical and online learning. Journal of Machine Learning Reseach, 16(54):1793–1861, 2015.
- M. R. Forster and E. Sober. How to tell when simpler, more unified, or less ad hoc theories will provide more accurate predictions. British Journal for the Philosophy of Science, 45(1):1–35, 1994.
- T. Freiesleben and G. König. Dear XAI community, we need to talk! Fundamental misconceptions in current XAI research. Proceedings of the 1st World Conference on eXplainable Artificial Intelligence (XAI 2023), forthcoming.
- K. Genin. The Topology of Statistical Inquiry. PhD Dissertation, CMU, Pittsburgh, 2018.
- E. M. Gold. Language identification in the limit. Information and Control, 10(5):447–474, 1967.
- Deep Learning. Adaptive Computation and Machine Learning. MIT Press, 2016.
- P. D. Grünwald. The Minimum Description Length Principle. MIT Series in Adaptive Computation and Machine Learning. MIT Press, 2007.
- M. Hardt and B. Recht. Patterns, Predictions, and Actions: Foundations of Machine Learning. Princeton University Press, 2022.
- G. Harman and S. Kulkarni. Reliable Reasoning: Induction and Statistical Learning Theory. The Jean Nicod Lectures. A Bradford Book. MIT Press, 2007.
- The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics. Springer, second edition, 2009.
- D. A. Herrmann. PAC learning and Occam’s razor: Probably approximately incorrect. Philosophy of Science, 87(4):685–703, 2020.
- Systems That Learn: An Introduction to Learning Theory. A Bradford Book. MIT Press, 2nd edition, 1999.
- H. Jeffreys. Theory of Probability. Clarendon Press, 1939.
- An Introduction to Computational Learning Theory. MIT Press, 1994.
- K. T. Kelly. The Logic of Reliable Inquiry. Logic and Computation in Philosophy. Oxford University Press, 1996.
- K. T. Kelly. Ockham’s razor, truth, and information. In J. F. van Benthem and P. W. Adriaans, editors, Handbook of the Philosophy of Information, volume 8 of Handbook of the Philosophy of Science, pages 321–360. Elsevier, 2008.
- K. T. Kelly. Simplicity, truth, and probability. In Bandyopadhyay and Forster (2011), pages 983–1024.
- K. T. Kelly. Learning theory and epistemology. In H. Arló-Costa, V. F. Hendricks, and J. F. A. K. van Benthem, editors, Readings in Formal Epistemology, volume 1 of Graduate Texts in Philosophy, pages 695–716. Springer, 2016.
- I. Levi. The Enterprise of Knowledge: An Essay on Knowledge, Credal Probability, and Chance. MIT Press, 1980.
- I. Levi. Pragmatism and change of view. In C. Misak, editor, Pragmatism, number 24 in Canadian Journal of Philosophy Supplementary Volume, pages 177–201. Cambridge University Press, 1998.
- I. Levi. Mild Contraction: Evaluating Loss of Information due to Loss of Belief. Clarenford Press, 2004.
- M. Li and P. M. B. Vitányi. An Introduction to Kolmogorov Complexity and Its Applications. Texts in Computer Science. Springer, third edition, 2008.
- T. M. Mitchell. Machine Learning. McGraw-Hill, 1997.
- J. Pearl. On the connection between the complexity and credibility of inferred models. International Journal of General Systems, 4(4):255–264, 1978.
- K. Popper. The Logic of Scientific Discovery. Hutchinson, 2002/1959. Republished 2002, Routledge Classics.
- G. Priest. Gruesome simplicity. Philosophy of Science, 43(3):432–437, 1976.
- J.-W. Romeijn. Inherent complexity: A problem for statistical model evaluation. Philosophy of Science, 84(5):797–809, 2017.
- S. Russell. Inductive learning by machines. Philosophical Studies, 64(1):37–64, 1991.
- C. Schaffer. Overfitting avoidance as bias. Machine Learning, 10:153–178, 1993.
- C. Schaffer. A conservation law for generalization performance. In W. W. Cohen and H. Hirsch, editors, Proceedings of the 11th International Conference on Machine Learning (ICML 1994), pages 259–265, San Francisco, CA, 1994. Morgan Kaufmann.
- O. Schulte. Means-ends epistemology. The British Journal for the Philosophy of Science, 50(1):1–31, 1999.
- O. Schulte. Formal Learning Theory. In E. N. Zalta, editor, The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University, 2017.
- Winner’s curse? On pace, progress, and empirical rigor. In Workshop Proceedings of the 6th International Conference on Learning Representations (ICLR), 2018.
- S. Shalev-Shwartz and S. Ben-David. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, 2014.
- E. Sober. Ockham’s Razors: A User’s Manual. Cambridge University Press, 2015.
- D. Steel. Testability and Ockham’s razor: How formal and statistical learning theory converge in the new riddle of induction. Journal of Philosophical Logic, 38(5):471–489, 2009.
- D. Steel. Testability and statistical learning theory. In Bandyopadhyay and Forster (2011), pages 849–861.
- T. F. Sterkenburg. Solomonoff prediction and Occam’s razor. Philosophy of Science, 83(4):459–479, 2016.
- T. F. Sterkenburg. Universal Prediction: A Philosophical Investigation. PhD Dissertation, University of Groningen, 2018.
- T. F. Sterkenburg. On characterizations of learnability with computable learners. In P.-L. Loh and M. Raginsky, editors, Proceedings of Thirty Fifth Conference on Learning Theory (COLT 2022), volume 178 of Proceedings of Machine Learning Research, pages 3365–3379. PMLR, 2022.
- The no-free-lunch theorems of supervised learning. Synthese, 199:9979–10015, 2021.
- L. G. Valiant. A theory of the learnable. Communications of the Association for Computing Machinery, 27(11):1134–1142, 1984.
- B. C. van Fraassen. Laws and Symmetry. Clarendon Press, 1989.
- B. C. van Fraassen. The false hopes of traditional epistemology. Philosophy and Phenomenological Research, 60(2):253–280, 2000.
- B. C. van Fraassen. The Empirical Stance. The Terry Lectures. Yale University Press, 2004.
- V. N. Vapnik. Statistical Learning Theory. Wiley, 1998.
- V. N. Vapnik. An overview of statistical learning theory. IEEE Transactions on Neural Networks, 10(5):988–999, 1999.
- V. N. Vapnik. The Nature of Statistical Learning Theory. Statistics for Engineering and Information Science. Springer, 2nd edition, 2000.
- On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and Its Applications, 16(2):264–280, 1971. Translation of the Russian original in Teoriya Veroyatnostei i ee Primeneniya, 16(2): 264–279, 1971.
- U. von Luxburg and B. Schölkopf. Statistical learning theory: Models, concepts, and results. In D. M. Gabbay, S. Hartmann, and J. Woods, editors, Inductive Logic, volume 10 of Handbook of the History of Logic, pages 651–706. Elsevier, 2011.
- G. I. Webb. Further experimental evidence against the utility of Occam’s razor. Journal of Artificial Intelligence Research, 4:397–417, 1996.
- A fine-grained analysis on distribution shift. In Proceedings of the Tenth International Conference on Learning Representations (ICLR), pages 1–15, 2022.
- D. H. Wolpert. On the connection between in-sample testing and generalization error. Complex Systems, 6:47–94, 1992.
- D. H. Wolpert. The lack of a priori distinctions between learning algorithms. Neural Computation, 8(7):1341–1390, 1996.