Gaussian Quadrature for Kernel Features (1709.02605v3)

Published 8 Sep 2017 in cs.LG

Abstract: Kernel methods have recently attracted resurgent interest, showing performance competitive with deep neural networks in tasks such as speech recognition. The random Fourier features map is a technique commonly used to scale up kernel machines, but employing the randomized feature map means that $O(\epsilon^{-2})$ samples are required to achieve an approximation error of at most $\epsilon$. We investigate some alternative schemes for constructing feature maps that are deterministic, rather than random, by approximating the kernel in the frequency domain using Gaussian quadrature. We show that deterministic feature maps can be constructed, for any $\gamma > 0$, to achieve error $\epsilon$ with $O(e^{{e^\gamma}} + \epsilon^{{-1/\gamma})$} samples as $\epsilon$ goes to 0. Our method works particularly well with sparse ANOVA kernels, which are inspired by the convolutional layer of CNNs. We validate our methods on datasets in different domains, such as MNIST and TIMIT, showing that deterministic features are faster to generate and achieve accuracy comparable to the state-of-the-art kernel methods based on random Fourier features.

Authors (3)

Tri Dao (47 papers)
Christopher De Sa (77 papers)
Christopher Ré (194 papers)

Citations (47)

View on Semantic Scholar

Summary

The paper introduces a deterministic feature map construction using Gaussian quadrature, achieving exponentially small kernel approximation errors.
It demonstrates that for sparse ANOVA kernels, the method requires significantly fewer samples than random Fourier features, enhancing scalability.
Empirical tests on datasets like MNIST and TIMIT validate the approach's superior efficiency and prediction accuracy in real-world applications.

Gaussian Quadrature for Kernel Features

The paper "Gaussian Quadrature for Kernel Features" explores alternative methodologies to enhance the scalability and accuracy of kernel machines, specifically through deterministic feature map construction using Gaussian quadrature. The authors challenge the prevailing use of random Fourier features and assert the merits of deterministic feature maps, especially in the context of sparse ANOVA kernels.

Kernel machines are instrumental in processing large datasets where the representation of input vectors in terms of a kernel function is crucial for classification tasks. Nevertheless, conventional kernel methods suffer from inefficiencies due to their dependence on Gram matrices, fundamentally limiting their scalability. The introduction of random Fourier features has been a potent remedy, enabling kernel approximation using scalable linear methods. Despite its efficacy, the random Fourier approach inherently lacks deterministic accuracy guarantees.

The authors propose a deterministic scheme leveraging Gaussian quadrature to construct feature maps without randomness. This method notably achieves error $\epsilon$ with $O(e^{e^\gamma} + \epsilon^{-1/\gamma})$ samples as $\epsilon$ approaches zero, proving particularly advantageous for sparse ANOVA kernels. Sparse ANOVA kernels, inherently akin to convolutional layers in CNNs, significantly benefit from this deterministic approach, exceeding the performance and efficiency of random Fourier features.

The contributions of the paper are noteworthy:

Deterministic Feature Map Construction: The authors present a methodology for constructing deterministic feature maps for subgaussian kernels, achieving exponentially small approximation errors.
Sparse ANOVA Kernel Efficiency: For sparse ANOVA kernels, the proposed deterministic feature maps require significantly fewer samples than traditional random Fourier features, optimizing both error reduction and kernel size.
Experimental Validation: Empirical tests on datasets such as MNIST and TIMIT demonstrate the efficacy of deterministic features, not only in terms of speed but also in matching the accuracy of state-of-the-art random kernel methods.

Implications of this research are manifold. On the practical front, deterministic feature maps could revolutionize areas where predictive accuracy and computational efficiency are paramount, potentially influencing real-time processing applications. Theoretically, this work rekindles discussions around the necessity of randomness in kernel approximation, urging further exploration into deterministic approaches across other types of kernels. As AI continues to advance, the constructs introduced in this paper may spur new developments in machine learning, offering robust alternatives to traditional methodologies.

In conclusion, "Gaussian Quadrature for Kernel Features" makes convincing strides in improving kernel method efficiency and accuracy. It lays the groundwork for the broader application of deterministic feature maps, challenging the status quo and opening pathways for future innovations in AI.

Related Papers

YouTube

Show All Videos