Bridging Algorithmic Information Theory and Machine Learning: A New Approach to Kernel Learning (2311.12624v3)
Abstract: Machine Learning (ML) and Algorithmic Information Theory (AIT) look at Complexity from different points of view. We explore the interface between AIT and Kernel Methods (that are prevalent in ML) by adopting an AIT perspective on the problem of learning kernels from data, in kernel ridge regression, through the method of Sparse Kernel Flows. In particular, by looking at the differences and commonalities between Minimal Description Length (MDL) and Regularization in Machine Learning (RML), we prove that the method of Sparse Kernel Flows is the natural approach to adopt to learn kernels from data. This approach aligns naturally with the MDL principle, offering a more robust theoretical basis than the existing reliance on cross-validation. The study reveals that deriving Sparse Kernel Flows does not require a statistical approach; instead, one can directly engage with code-lengths and complexities, concepts central to AIT. Thereby, this approach opens the door to reformulating algorithms in machine learning using tools from AIT, with the aim of providing them a more solid theoretical foundation.
- Learning “best” kernels from data in gaussian process regression. with application to aerodynamics. Journal of Computational Physics, 470:111595, 2022.
- Operator-theoretic framework for forecasting nonlinear time series with kernel analog techniques. Physica D: Nonlinear Phenomena, 409:132520, 2020.
- N. Aronszajn. Theory of reproducing kernels. Transaction of the American Mathematical Society, 68(3):337–404, 1950.
- Francis Bach. Information theory with kernel methods, 2022.
- Balanced reduction of nonlinear control systems in reproducing kernel hilbert space. In 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 294–301, 2010.
- Empirical estimators for stochastically forced nonlinear systems: Observability, controllability and the invariant measure. Proc. of the 2012 American Control Conference, pages 294–301, 2012. https://arxiv.org/abs/1204.0563v1.
- Kernel methods for the approximation of nonlinear systems. SIAM J. Control and Optimization, 2017. https://arxiv.org/abs/1108.2903.
- Kernel methods for the approximation of some key quantities of nonlinear systems. Journal of Computational Dynamics, 1, 2017. http://arxiv.org/abs/1204.0563.
- Dimensionality reduction of complex metastable systems via kernel embeddings of transition manifolds. Journal of Nonlinear Science, 31(3), 2021.
- On the mathematical foundations of learning. Bulletin of the American Mathematical Society, 39:1–49, 2002.
- Learning dynamical systems from data: a simple cross-validation perspective, part II: nonparametric kernel flows. 2021.
- One-shot learning of stochastic differential equations with computational graph completion. preprint, 2021.
- Kernel methods for center manifold approximation and a weak data-based version of the center manifold theorems. Physica D, 2021.
- Approximation of Lyapunov functions from noisy data. Journal of Computational Dynamics, 2019. https://arxiv.org/abs/1601.01568.
- Irving John Good. Explicativity, corroboration, and the relative odds of hypotheses. Synthese, 30(1-2):39–73, 1975.
- P. D. Grünwald. The Minimum Description Length Principle. The MIT Press, Cambridge, 2007.
- Peter D. Grunwald and Paul M. B. Vitanyi. Algorithmic information theory, 2008.
- Kernel methods for the approximation of discrete-time linear autonomous and control systems. SN Applied Sciences, 1(7):1–12, 2019.
- Greedy kernel methods for center manifold approximation. Proc. of ICOSAHOM 2018, International Conference on Spectral and High Order Methods, (1), 2018. https://arxiv.org/abs/1810.11329.
- Nonlinear signal processing and system identification: applications to time series from electrochemical reactions. Chemical Engineering Science, 45(8):2075–2081, 1990.
- A note on kernel methods for multiscale systems with critical transitions. Mathematical Methods in the Applied Sciences, 42(3):907–917, 2019.
- Learning dynamical systems from data: A simple cross-validation perspective, part i: Parametric kernel flows. Physica D: Nonlinear Phenomena, 421:132817, 2021.
- Learning dynamical systems from data: A simple cross-validation perspective, part iv: Case with partial observations. Submitted, 2022.
- A note on microlocal kernel design for some slow–fast stochastic differential equations with critical transitions and application to eeg signals. Physica A: Statistical Mechanics and its Applications, 616:128583, 2023.
- An Introduction to Universal Artificial Intelligence. Chapman & Hall/CRC Artificial Intelligence and Robotics Series. Taylor and Francis, 2024.
- Marcus Hutter. Optimality of universal Bayesian prediction for general loss and alphabet. Journal of Machine Learning Research, 4:971–1000, 2003.
- Marcus Hutter. Algorithmic information theory: a brief non-technical guide to the field. Scholarpedia, 2(3):2519, 2007.
- Marcus Hutter. The loss rank principle for model selection. In Proc. 20th Annual Conf. on Learning Theory (COLT’07), volume 4539 of LNAI, pages 589–603, San Diego, USA, 2007. Springer.
- Marcus Hutter. Algorithmic complexity. Scholarpedia, 3(1):2573, 2008.
- Marcus Hutter. Discrete MDL predicts in total variation. In Advances in Neural Information Processing Systems 22 (NIPS’09), pages 817–825, Cambridge, MA, USA, 2009. Curran Associates.
- Kernel-based approximation of the koopman generator and schrodinger operator. Entropy, 22, 2020. https://www.mdpi.com/1099-4300/22/7/722.
- Data-driven approximation of the koopman generator: Model reduction, system identification, and control. Physica D: Nonlinear Phenomena, 406:132416, 2020.
- Dimensionality reduction of complex metastable systems via kernel embeddings of transition manifold, 2019. https://arxiv.org/abs/1904.08622.
- Learning dynamical systems from data: A simple cross-validation perspective, part iii: Irregularly-sampled time series. Physica D: Nonlinear Phenomena, 443:133546, 2023.
- Zong min Wu and Robert Schaback. Local error estimates for radial basis function interpolation of scattered data. IMA J. Numer. Anal, 13:13–27, 1992.
- The bayesian information criterion: background, derivation, and applications. Wiley Interdisciplinary Reviews: Computational Statistics, 4(2):199–203, 2012.
- Operator-Adapted Wavelets, Fast Solvers and Numerical Homogenization: From a Game Theoretic Approach to Numerical Approximation and Algorithm Design. Cambridge Monographs on Applied and Computational Mathematics. Cambridge University Press, 2019.
- Houman Owhadi. Bayesian numerical homogenization. Multiscale Modeling & Simulation, 13(3):812–828, 2015.
- Boumediene Hamzi , Romit Maulik, Houman Owhadi. Simple, low-cost and accurate data-driven geophysical forecasting with learned kernels. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 477(2252), 2021.
- H. Owhadi and G. R. Yoo. Kernel flows: From learning kernels from data into the abyss. Journal of Computational Physics, 389:22–47, 2019.
- Asymptotics of discrete MDL for online prediction. IEEE Transactions on Information Theory, 51(11):3780–3795, 2005.
- A philosophical treatise of universal induction. Entropy, 13(6):1076–1136, 2011.
- Support vector machines. Springer Science & Business Media, 2008.
- Kernel methods for surrogate modeling. 2019. https://arxiv.org/abs/1907.105566.
- Kernel flows to infer the structure of convective storms from satellite passive microwave observations. preprint, 2021.
- C. S. Wallace. Statistical and Inductive Inference by Minimum Message Length. Information Science and Statistics. Springer, New York, 2005.
- Learning dynamical systems from data: A simple cross-validation perspective, part vi: Hausdorff-metric based kernel flows to learn attractors and invariants sets. 2023.
- Learning dynamical systems from data: A simple cross-validation perspective, part v: Sparse kernel flows for 132 chaotic dynamical systems. Physica D: Nonlinear Phenomena, 460:134070, 2024.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.