Iterative Methods for Vecchia-Laplace Approximations for Latent Gaussian Process Models (2310.12000v4)
Abstract: Latent Gaussian process (GP) models are flexible probabilistic non-parametric function models. Vecchia approximations are accurate approximations for GPs to overcome computational bottlenecks for large data, and the Laplace approximation is a fast method with asymptotic convergence guarantees to approximate marginal likelihoods and posterior predictive distributions for non-Gaussian likelihoods. Unfortunately, the computational complexity of combined Vecchia-Laplace approximations grows faster than linearly in the sample size when used in combination with direct solver methods such as the Cholesky decomposition. Computations with Vecchia-Laplace approximations can thus become prohibitively slow precisely when the approximations are usually the most accurate, i.e., on large data sets. In this article, we present iterative methods to overcome this drawback. Among other things, we introduce and analyze several preconditioners, derive new convergence results, and propose novel methods for accurately approximating predictive variances. We analyze our proposed methods theoretically and in experiments with simulated and real-world data. In particular, we obtain a speed-up of an order of magnitude compared to Cholesky-based calculations and a threefold increase in prediction accuracy in terms of the continuous ranked probability score compared to a state-of-the-art method on a large satellite data set. All methods are implemented in a free C++ software library with high-level Python and R packages.
- Iterative numerical methods for sampling from high dimensional gaussian distributions. Statistics and Computing, 23:501–521, 2013.
- Parameter estimation in high dimensional gaussian distributions. Statistics and Computing, 24:247–263, 2014.
- A. Cortinovis and D. Kressner. On randomized trace estimates for indefinite matrices with an application to determinants. Foundations of Computational Mathematics, 22(3):875–903, 2022.
- N. Cressie. Statistics for spatial data. John Wiley & Sons, 1993.
- Hierarchical nearest-neighbor gaussian process models for large geostatistical datasets. Journal of the American Statistical Association, 111(514):800–812, 2016.
- Scalable log determinants for gaussian process kernel learning. Advances in Neural Information Processing Systems, 30, 2017.
- Gpytorch: Blackbox matrix-matrix gaussian process inference with gpu acceleration. Advances in neural information processing systems, 31, 2018.
- Matrix computations. JHU press, 2013.
- J. Guinness. Permutation and grouping methods for sharpening gaussian process approximations. Technometrics, 60(4):415–429, 2018.
- J. Guinness. Gaussian process learning via Fisher scoring of Vecchia’s approximation. Statistics and Computing, 31(3):1–8, 2021.
- On the low-rank approximation by the pivoted cholesky decomposition. Applied numerical mathematics, 62(4):428–440, 2012.
- A case study competition among methods for analyzing large spatial data. Journal of Agricultural, Biological and Environmental Statistics, 24:398–425, 2019.
- Matrix analysis. Cambridge university press, 2012.
- M. Katzfuss and J. Guinness. A general framework for vecchia approximations of gaussian processes. Statistical Science, 36(1):124–141, 2021.
- Vecchia approximations of gaussian-process predictions. Journal of Agricultural, Biological and Environmental Statistics, 25:383–414, 2020.
- A guide to sample average approximation. Handbook of simulation optimization, pages 207–243, 2015.
- D. MacKay and M. Gibbs. Efficient implementation of gaussian processes. Neural Computation, 1997.
- Kryging: geostatistical analysis of large-scale datasets using krylov subspace methods. Statistics and Computing, 32(5):74, 2022.
- H. Nickisch and C. E. Rasmussen. Approximations for binary gaussian process classification. Journal of Machine Learning Research, 9(Oct):2035–2078, 2008.
- A. Nishimura and M. A. Suchard. Prior-preconditioned conjugate gradient method for accelerated gibbs sampling in “large n, large p” bayesian sparse regression. Journal of the American Statistical Association, pages 1–14, 2022.
- Constant-time predictive distributions for gaussian processes. In International Conference on Machine Learning, pages 4114–4123. PMLR, 2018.
- F. Roosta-Khorasani and U. Ascher. Improved bounds on sample size for implicit matrix trace estimators. Foundations of Computational Mathematics, 15(5):1187–1212, 2015.
- Y. Saad. Iterative methods for sparse linear systems. SIAM, 2003.
- F. Sigrist. Gaussian process boosting. The Journal of Machine Learning Research, 23(1):10565–10610, 2022a.
- F. Sigrist. Latent gaussian model boosting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(2):1894–1905, 2022b.
- Fast sampling from a gaussian markov random field using krylov subspace approaches. 2008.
- Stochastic approximation of score functions for gaussian processes. The Annals of Applied Statistics, 7(2):1162, 2013.
- L. Tierney and J. B. Kadane. Accurate approximations for posterior moments and marginal densities. Journal of the american statistical association, 81(393):82–86, 1986.
- Fast estimation of tr(f(a)) via stochastic lanczos quadrature. SIAM Journal on Matrix Analysis and Applications, 38(4):1075–1099, 2017.
- H. A. Van der Vorst. Iterative Krylov methods for large linear systems. Number 13. Cambridge University Press, 2003.
- A. V. Vecchia. Estimation and model identification for continuous spatial processes. Journal of the Royal Statistical Society Series B: Statistical Methodology, 50(2):297–312, 1988.
- Preconditioning for scalable gaussian process hyperparameter optimization. In International Conference on Machine Learning, pages 23751–23780. PMLR, 2022.
- Gaussian processes for machine learning. MIT Press Cambridge, MA, 2006.
- K. Wu and H. Simon. Thick-restart lanczos method for large symmetric eigenvalue problems. SIAM Journal on Matrix Analysis and Applications, 22(2):602–616, 2000.
- D. Zilber and M. Katzfuss. Vecchia–laplace approximations of generalized gaussian processes for big non-gaussian spatial data. Computational Statistics & Data Analysis, 153:107081, 2021.