Doubly Stochastic Variational Inference for Deep Gaussian Processes (1705.08933v2)

Published 24 May 2017 in stat.ML

Abstract: Gaussian processes (GPs) are a good choice for function approximation as they are flexible, robust to over-fitting, and provide well-calibrated predictive uncertainty. Deep Gaussian processes (DGPs) are multi-layer generalisations of GPs, but inference in these models has proved challenging. Existing approaches to inference in DGP models assume approximate posteriors that force independence between the layers, and do not work well in practice. We present a doubly stochastic variational inference algorithm, which does not force independence between layers. With our method of inference we demonstrate that a DGP model can be used effectively on data ranging in size from hundreds to a billion points. We provide strong empirical evidence that our inference scheme for DGPs works well in practice in both classification and regression.

Authors (2)

Hugh Salimbeni (8 papers)
Marc Deisenroth (6 papers)

Citations (398)

View on Semantic Scholar

Summary

An Analysis of "Doubly Stochastic Variational Inference for Deep Gaussian Processes"

The paper, authored by Hugh Salimbeni and Marc Peter Deisenroth, presents a novel approach to the inference problem in Deep Gaussian Processes (DGPs) using a method termed Doubly Stochastic Variational Inference. DGPs extend Gaussian Processes (GPs) by composing multiple layers of GPs, analogous to the layer stacking in deep neural networks, providing a more expressive model that retains uncertainty quantification. However, conducting inference in DGPs is notably challenging due to the complex dependencies and correlations between layers.

Key Contributions

The authors address the limitations of existing variational inference methods, specifically the common practice of assuming independence between the layers in DGPs. This assumption, prevalent in earlier methods, does not adequately capture the correlations between layers, potentially leading to suboptimal performance, particularly as model depth and complexity increase. The paper introduces a doubly stochastic variational inference approach that avoids independent layer assumptions, thereby maintaining the full dependency structure of the model.

The proposed method leverages a sparse inducing point approximation to ensure computational tractability within each layer, while also preserving correlations across the model layers. The authors employ a variational posterior that shares the same non-parametric nature as the original DGP, but allows for flexible incorporation of practical constraints, such as computational load, especially with vast datasets.

Methodological Details

The algorithm is termed "doubly stochastic" due to two primary sources of stochasticity utilized in inference: sampling from the variational posterior and data subsampling in minibatches. The former is critical in maintaining model fidelity and propagating uncertainty, whereas the latter ensures scalability to large datasets. The approach efficiently handles datasets ranging from hundreds to one billion data points without significant compromises on prediction accuracy.

Empirical Results

Experiments conducted on standard regression and classification benchmarks exhibit that the proposed DGP model outperforms both single-layer GPs and other variational methods that enforce layer-wise independence. Notably, the DGPs consistently perform as well as or better than single-layer GPs, demonstrating enhanced performance with added model layers without overfitting, even on smaller datasets. The results on large-scale datasets, such as the New York taxi trips dataset, are particularly impressive, where the DGP model achieves significant improvements over single-layer counterparts.

Implications and Future Directions

The implications of this research are considerable for both theoretical exploration and practical application in machine learning. By successfully addressing the inference challenges in DGPs, this work paves the way for broader adoption of non-parametric deep learning models in applications requiring robust uncertainty estimation. Future research might explore further optimization of the mean function within layers or adaptation of this approach to other kinds of structured probabilistic models. Additionally, integration with domain-specific kernels could further enhance the expressive power and applicability of DGPs.

In summary, the proposed doubly stochastic variational inference method represents a meaningful stride in making deep Gaussian processes a more viable tool for machine learning practitioners, enhancing model expressiveness while ensuring scalability and computational feasibility.

PDF Markdown