Variational Relevance Vector Machines (1301.3838v1)

Published 16 Jan 2013 in cs.LG and stat.ML

Abstract: The Support Vector Machine (SVM) of Vapnik (1998) has become widely established as one of the leading approaches to pattern recognition and machine learning. It expresses predictions in terms of a linear combination of kernel functions centred on a subset of the training data, known as support vectors. Despite its widespread success, the SVM suffers from some important limitations, one of the most significant being that it makes point predictions rather than generating predictive distributions. Recently Tipping (1999) has formulated the Relevance Vector Machine (RVM), a probabilistic model whose functional form is equivalent to the SVM. It achieves comparable recognition accuracy to the SVM, yet provides a full predictive distribution, and also requires substantially fewer kernel functions. The original treatment of the RVM relied on the use of type II maximum likelihood (the `evidence framework') to provide point estimates of the hyperparameters which govern model sparsity. In this paper we show how the RVM can be formulated and solved within a completely Bayesian paradigm through the use of variational inference, thereby giving a posterior distribution over both parameters and hyperparameters. We demonstrate the practicality and performance of the variational RVM using both synthetic and real world examples.

Authors (2)

Christopher M. Bishop (4 papers)
Michael Tipping (1 paper)

Citations (423)

View on Semantic Scholar

Summary

The paper introduces a fully Bayesian formulation of the Relevance Vector Machine using variational inference to generate probabilistic predictions.
It employs a factorized posterior distribution and variational optimization to achieve competitive error rates with significantly fewer kernel functions.
Empirical results, including improved accuracy on benchmark datasets like Ripley, demonstrate the VRVM's efficiency and potential for scalable, deep learning applications.

Variational Relevance Vector Machines: A Bayesian Approach

The paper presents an advancement in the field of Gaussian processes and sparse kernel machines by introducing the Variational Relevance Vector Machine (VRVM), an extension of the Relevance Vector Machine (RVM). RVMs themselves are notable for offering a probabilistic framework that provides predictive distributions rather than mere point predictions, contrasting with the Support Vector Machines (SVMs).

Key Contributions

The central contribution of this work is the formulation of the RVM within a fully Bayesian framework by utilizing variational inference methods. This is a significant divergence from the original RVM formulation which relied on type-II maximum likelihood estimations, also known as the 'evidence framework', for determining point estimates of hyperparameters. By adopting a Bayesian perspective, the VRVM generates a posterior distribution over both the parameters and the hyperparameters, addressing a critical limitation of traditional SVMs—non-probabilistic predictions.

Technical Approach

The VRVM model retains the equivalency in functional form to the SVM but innovatively applies a factorized approximation to the posterior distribution to enable a variational inference method. This approach involves:

Assumption of Factorization: A factorized form Q(w, α, T) = Q_w(w) Q_α(α) Q_T(T) is introduced, which enables the decomposition of the marginal likelihood into a tractable form.
Variational Optimization: The variational bound is maximized over the assumed factorized distributions, ensuring that the model complexity is controlled not by an explicit feature selection process but through the introduction of continuous hyperparameters that influence the sparsity of the model.

Empirical Results

The effectiveness of the VRVM is demonstrated through both synthetic and real-world datasets in the domains of regression and classification. For regression tasks, VRVMs achieve competitive error rates compared to SVMs while using significantly fewer kernel functions. In classification tasks, VRVMs similarly maintain accuracy while reducing the model complexity. Notably, in the "Ripley" dataset—an established benchmark—VRVMs achieve slightly superior accuracy rates compared to both SVMs and original RVMs with a much-reduced kernel set, highlighting the efficiency of the Bayesian framework in identifying relevant data vectors.

Implications for Future Research

The introduction of VRVM is a step forward in the attempt to reconcile the efficiency and sparsity of SVM-like algorithms with the strengths of Bayesian methods, particularly in modeling uncertainty. This work suggests potential future exploration into:

Scalability: As the computational cost associated with variational Bayesian methods can be extensive, especially with large datasets, developing more efficient inference algorithms or leveraging optimization tricks could make VRVMs more widely applicable.
Extension to Deep Learning: Considering the importance of uncertainty in deep learning, future work could investigate extensions of the VRVM principles to neural networks, potentially enhancing their interpretability and robustness to overfitting on smaller datasets.
Further Integration with Probabilistic Graphical Models: Exploitation of variational Bayesian methods in more complex environments that involve multi-modal data or hierarchical dependencies may yield richer probabilistic models.

The VRVM stands as a formidable alternative to traditional kernel-based methods by unifying the merits of sparsity and predictive distributions, providing a robust framework well-suited for tasks demanding both high accuracy and uncertainty quantification.