FALKON: An Optimal Large Scale Kernel Method (1705.10958v3)

Published 31 May 2017 in stat.ML and cs.LG

Abstract: Kernel methods provide a principled way to perform non linear, nonparametric learning. They rely on solid functional analytic foundations and enjoy optimal statistical properties. However, at least in their basic form, they have limited applicability in large scale scenarios because of stringent computational requirements in terms of time and especially memory. In this paper, we take a substantial step in scaling up kernel methods, proposing FALKON, a novel algorithm that allows to efficiently process millions of points. FALKON is derived combining several algorithmic principles, namely stochastic subsampling, iterative solvers and preconditioning. Our theoretical analysis shows that optimal statistical accuracy is achieved requiring essentially $O(n)$ memory and $O(n\sqrt{n})$ time. An extensive experimental analysis on large scale datasets shows that, even with a single machine, FALKON outperforms previous state of the art solutions, which exploit parallel/distributed architectures.

Citations (192)

View on Semantic Scholar

Summary

The paper introduces FALKON, a novel algorithm that makes kernel methods, particularly kernel ridge regression (KRR), efficient and scalable for processing large datasets with millions of data points.
FALKON employs stochastic subsampling, iterative solvers, and Nyström-based preconditioning to reduce computational complexity to O(n√n) time and O(n) memory while maintaining optimal statistical accuracy.
Empirical results demonstrate FALKON's superior performance and computational efficiency over existing large-scale kernel methods, allowing it to compete with deep neural networks on large data even on a single machine.

FALKON: An Optimal Large Scale Kernel Method

The paper "FALKON: An Optimal Large Scale Kernel Method" introduces FALKON, a novel algorithm designed to efficiently scale kernel methods to large datasets, addressing the significant computational challenges associated with kernel ridge regression (KRR). Kernel methods are renowned for their ability to perform nonlinear, nonparametric learning, making them a powerful tool in supervised learning tasks. Despite their robust theoretical foundation and optimal statistical properties, the computational burden of kernel methods typically limits their application to smaller datasets. The authors of this paper propose FALKON as a solution to this limitation, enabling kernel methods to process datasets with millions of data points effectively and efficiently.

Methodology and Algorithmic Innovations

FALKON leverages the following algorithmic principles: stochastic subsampling, iterative solvers, and preconditioning. A core innovation of FALKON is its approach to preconditioning, which optimizes the condition number of the linear system associated with KRR, thereby enhancing the convergence speed of iterative solvers. By using a Nyström method for both problem approximation and preconditioning computation, FALKON achieves computational efficiency without compromising statistical accuracy. The algorithm reduces the computational time complexity to $O(n\sqrt{n})$ and the memory requirement to $O(n)$ . These improvements make FALKON an attractive alternative to existing methods, particularly in terms of computational resource utilization and scalability.

Computational Complexity and Statistical Guarantees

The paper thoroughly analyzes the computational complexity and statistical guarantees provided by FALKON. The authors demonstrate that optimal statistical accuracy is maintained even as computational demands are reduced from $O(n^2)$ to $O(n\sqrt{n})$ kernel evaluations. The algorithm achieves this by combining the computational benefits of iterative solvers with the statistical robustness of Nyström subsampling. Theoretical results presented in the paper include bounds on excess risk and conditions under which FALKON's preconditioning achieves favorable convergence rates. Notably, FALKON requires only $\log n$ iterations for convergence to a statistically efficient solution, contrasting sharply with existing methods that require exponentially more.

Empirical Evaluation

Extensive experimental analyses are conducted on large-scale datasets, demonstrating that FALKON outperforms state-of-the-art kernel methods, including those deployed on parallel and distributed architectures. The results highlight FALKON's superior prediction accuracy and computational efficiency even when executed on a single machine, empowering kernel methods to compete with and potentially exceed the performance of deep neural networks on large-scale data.

Implications and Future Directions

The implications are profound. FALKON not only democratizes the application of kernel methods in scenarios involving vast amounts of data but also suggests a compelling alternative to deep fully connected neural networks. By making large-scale kernel learning feasible on standard computational hardware, FALKON could broaden the scope and impact of machine learning applications.

Future Developments

The paper opens various avenues for future research and development. These include refining sampling strategies to further optimize computational efficiency and statistical efficacy, exploring the scalability to even larger datasets, and adapting FALKON to other kernel-based methods beyond KRR. Furthermore, the integration of FALKON into real-world applications could be explored, allowing industries and researchers to exploit its potential in practical settings.

Conclusion

"FALKON: An Optimal Large Scale Kernel Method" represents a significant advancement in the field of kernel methods, presenting an algorithm that not only enhances computational efficiency but also maintains optimal learning performance. By addressing the longstanding challenges associated with scaling kernel methods to large data, this paper contributes a valuable tool that broadens the applicability and impact of kernel-based approaches in machine learning and related domains.