Kernel Flows: from learning kernels from data into the abyss (1808.04475v2)

Published 13 Aug 2018 in stat.ML and cs.LG

Abstract: Learning can be seen as approximating an unknown function by interpolating the training data. Kriging offers a solution to this problem based on the prior specification of a kernel. We explore a numerical approximation approach to kernel selection/construction based on the simple premise that a kernel must be good if the number of interpolation points can be halved without significant loss in accuracy (measured using the intrinsic RKHS norm $|\cdot|$ associated with the kernel). We first test and motivate this idea on a simple problem of recovering the Green's function of an elliptic PDE (with inhomogeneous coefficients) from the sparse observation of one of its solutions. Next we consider the problem of learning non-parametric families of deep kernels of the form $K_1(F_n(x),F_n(x'))$ with $F_{n+1}=(I_d+\epsilon G_{n+1})\circ F_n$ and $G_{n+1} \in \operatorname{Span}{K_1(F_n(x_i),\cdot)}$. With the proposed approach constructing the kernel becomes equivalent to integrating a stochastic data driven dynamical system, which allows for the training of very deep (bottomless) networks and the exploration of their properties. These networks learn by constructing flow maps in the kernel and input spaces via incremental data-dependent deformations/perturbations (appearing as the cooperative counterpart of adversarial examples) and, at profound depths, they (1) can achieve accurate classification from only one data point per class (2) appear to learn archetypes of each class (3) expand distances between points that are in different classes and contract distances between points in the same class. For kernels parameterized by the weights of Convolutional Neural Networks, minimizing approximation errors incurred by halving random subsets of interpolation points, appears to outperform training (the same CNN architecture) with relative entropy and dropout.

Citations (81)

View on Semantic Scholar

Summary

The paper introduces a novel evaluation criterion for kernels, demonstrating that a good kernel maintains performance even with halved interpolation points.
The paper details an iterative Kernel Flow algorithm that uses random sampling, gradient descent, and CNN-based parameterization to optimize kernel learning.
The paper validates its approach on PDE models and MNIST datasets, achieving competitive accuracy with significantly reduced training data.

Essay on "Kernel Flows: from learning kernels from data into the abyss"

The paper "Kernel Flows: from learning kernels from data into the abyss" by Houman Owhadi and Gene Ryan Yoo proposes a novel approach to kernel selection and construction within the framework of Gaussian Process Regression and numerical homogenization. The methodology, termed as "Kernel Flows" (KF), leverages the premise that a good kernel should maintain accuracy even if the interpolation set is reduced by half. This approach aligns with the intrinsic Reproducing Kernel Hilbert Spaces (RKHS) norms and is designed to be computationally efficient, allowing for deep, potentially bottomless, kernel networks.

Summary of Contributions

Evaluation Criterion for Kernels: The paper introduces a new criterion for evaluating kernels. A kernel is considered superior if it continues to perform well even as the number of interpolation points is halved. This metric is quantitatively defined by the relative error in the RKHS norm.
Algorithmic Framework: The Kernel Flow algorithm is presented as an iterative framework for learning kernels. It involves steps of sampling subsets of data, performing gradient descent, and updating kernels. The involvement of random sampling helps mitigate overfitting by averaging out noise and spurious correlations.
Numerical Performance: Through experiments, particularly with PDE models, MNIST, and Fashion MNIST datasets, the Kernel Flow approach demonstrates marked improvements in classification tasks even with a very small subset of interpolation points. The algorithm achieves competitive test accuracies using dramatically fewer training points which suggests substantial efficiency and robustness.
Kernel Parameterization via CNN: The paper extends to deep Convolutional Neural Networks (CNN), proposing that layers of CNN can be seen as kernel operations for data interpolation, further bridged by minimizing errors or $\rho$ , offering an understanding of neural network effectiveness from a kernel perspective.

Theoretical Implications

The paper's philosophy—equating learning to dynamic system integration—opens new avenues for comprehending and designing deep learning architectures. By demonstrating that kernel learning can integrate into a dynamical framework, it postulates that learning does not necessitate backpropagation-driven architectures but can instead proceed through non-parametrically constructed data flows.

Practical Implications

Practically, the Kernel Flow method shows impressive results in reducing computational cost and data requirement. The ability to leverage kernels as implied by CNN architectures provides a potential pathway to improving neural networks without exhaustive parameter tuning and architecture guesswork traditionally associated.

Future Directions

The work raises intriguing questions and future possibilities in AI development:

Dynamical System Perspective: Further exploration into interpreting machine learning as integrating dynamical systems opens potential applications in evolving complex systems.
Broader Kernel Applications: Extending kernel flows to other domains like signal processing, reinforcement learning or unsupervised clustering could unfold new application methodologies.
Architectural Extensions: Adapting other deep learning architectures into the kernel perspective might deliver generalization advantages analogous to those observed in kernel flows.

Conclusion

The "Kernel Flows" paper presents a compelling framework for understanding and improving kernel-based learning. Its ability to derive significant insights and practical results with minimal data makes it a noteworthy contribution, warranting further investigation into its theoretical and real-world extensions. Through a synergy of numerical homogenization and stochastic dynamic systems, the work carves a path for robust and efficient learning mechanisms that can transcend traditional neural network paradigms.

PDF Markdown

Related Papers

YouTube

Show All Videos