Deep Kernel Learning (1511.02222v1)

Published 6 Nov 2015 in cs.LG, cs.AI, stat.ME, and stat.ML

Abstract: We introduce scalable deep kernels, which combine the structural properties of deep learning architectures with the non-parametric flexibility of kernel methods. Specifically, we transform the inputs of a spectral mixture base kernel with a deep architecture, using local kernel interpolation, inducing points, and structure exploiting (Kronecker and Toeplitz) algebra for a scalable kernel representation. These closed-form kernels can be used as drop-in replacements for standard kernels, with benefits in expressive power and scalability. We jointly learn the properties of these kernels through the marginal likelihood of a Gaussian process. Inference and learning cost $O(n)$ for $n$ training points, and predictions cost $O(1)$ per test point. On a large and diverse collection of applications, including a dataset with 2 million examples, we show improved performance over scalable Gaussian processes with flexible kernel learning models, and stand-alone deep architectures.

Citations (831)

View on Semantic Scholar

Summary

The paper introduces a novel framework that integrates deep neural network transformations with spectral mixture kernels to enhance kernel expressiveness.
It proposes scalable techniques such as local kernel interpolation, inducing points, and structure exploiting algebra to efficiently handle large datasets.
The method demonstrates superior performance on UCI regression, face orientation extraction, digit magnitude recovery, and challenging discontinuity recovery tasks.

Deep Kernel Learning: An Integrative Approach for Scalability and Expressiveness

The paper "Deep Kernel Learning" presents a novel approach that marries the structural strengths of deep learning architectures with the non-parametric flexibility of kernel methods. The proposed technique involves transforming the inputs of a spectral mixture (SM) base kernel using a deep architecture and employing advanced algebraic methods like local kernel interpolation, inducing points, and structure exploiting algebra for scalable kernel representation.

Core Contributions and Techniques

The authors introduce scalable deep kernels that can serve as drop-in replacements for standard kernels in Gaussian processes (GPs). These kernels benefit from both the expressive power of deep neural networks and the scalability of advanced kernel methods. Specifically, the paper focuses on:

Input Transformation via Deep Architecture: The inputs to a base kernel are transformed using deep feedforward fully-connected and convolutional networks. This transformation enhances the expressive capabilities of the kernel function beyond traditional parametric forms.
Spectral Mixture (SM) Base Kernel: The use of the spectral mixture kernel, which accommodates a broader range of dependencies by capturing multiple frequencies and bandwidths, further amplifies the representational power of the kernel.
Scalable Kernel Representation: Techniques such as local kernel interpolation, inducing points, and structure exploiting algebra (Kronecker and Toeplitz methods) are employed to maintain the scalability of the kernel representation. This results in computational efficiencies, making the model feasible for large datasets.
Unified Learning Framework: The model parameters are learned through the marginal likelihood of the Gaussian process, leveraging automatic differentiation to backpropagate derivatives efficiently. This joint learning framework negates the need for manual tuning during the training process.

Evaluations and Results

The paper evaluates the proposed method across several benchmarks:

UCI Regression Datasets:

The method consistently outperforms both scalable Gaussian processes utilizing advanced kernel learning methods and standalone deep neural networks across a diverse range of datasets. Notably, the performance improvements are achieved without significant increases in computational overhead.

Face Orientation Extraction:

Combining convolutional neural networks (CNNs) within the deep kernel learning framework, the model demonstrates superior accuracy in predicting face orientations compared to deep belief networks (DBNs) combined with GP and standalone CNNs. This task also illustrated the scalability and practical significance of the approach, even for high-dimensional image data.

Digit Magnitude Recovery:

The method successfully recovers the magnitude of handwritten digits from images, outperforming previous state-of-the-art models like DBN-GP and standalone CNNs. This further validates the model’s capability to learn meaningful representations from raw data.

Step Function Recovery:

The model exhibits proficiency in capturing discontinuities by recovering a step function, an inherently challenging problem for conventional GPs due to implicit smoothness assumptions.

Implications and Future Directions

The implications of this research are both practical and theoretical:

Practical Implications:

The proposed approach enhances the flexibility and expressive power of kernel methods, enabling them to handle complex, high-dimensional data more effectively. This is achieved alongside scalable training and inference, rendering the method suitable for large-scale applications.

Theoretical Implications:

By integrating adaptive deep learning transformations into the kernel framework, the method transcends the limitations of traditional Euclidean distance-based metrics, allowing for more expressive similarity measures. This unifying perspective opens new avenues for combined research in neural networks and kernel methods.

Final Remarks

The deep kernel learning method presented in this paper offers a compelling synthesis of kernel methods' interpretability and neural networks' adaptive power. By leveraging state-of-the-art techniques for efficient kernel representation and joint learning, the approach provides a scalable, robust solution for diverse, real-world machine learning tasks. This integration holds promise for future advancements in areas such as high-dimensional classification, reinforcement learning, and Bayesian optimization, and encourages further research into hybrid models that harness the strengths of both neural and kernel-based frameworks.

PDF Markdown