Leveraging Locality and Robustness to Achieve Massively Scalable Gaussian Process Regression (2306.14731v2)
Abstract: The accurate predictions and principled uncertainty measures provided by GP regression incur O(n3) cost which is prohibitive for modern-day large-scale applications. This has motivated extensive work on computationally efficient approximations. We introduce a new perspective by exploring robustness properties and limiting behaviour of GP nearest-neighbour (GPnn) prediction. We demonstrate through theory and simulation that as the data-size n increases, accuracy of estimated parameters and GP model assumptions become increasingly irrelevant to GPnn predictive accuracy. Consequently, it is sufficient to spend small amounts of work on parameter estimation in order to achieve high MSE accuracy, even in the presence of gross misspecification. In contrast, as n tends to infinity, uncertainty calibration and NLL are shown to remain sensitive to just one parameter, the additive noise-variance; but we show that this source of inaccuracy can be corrected for, thereby achieving both well-calibrated uncertainty measures and accurate predictions at remarkably low computational cost. We exhibit a very simple GPnn regression algorithm with stand-out performance compared to other state-of-the-art GP approximations as measured on large UCI datasets. It operates at a small fraction of those other methods' training costs, for example on a basic laptop taking about 30 seconds to train on a dataset of size n = 1.6 x 106.
- Properties and Comparison of Some Kriging Sub-model Aggregation Methods. Mathematical Geosciences, 2022.
- Y. Cao and D. J. Fleet. Generalized product of experts for automatic and principled fusion of gaussian process predictions. arXiv preprint arXiv:1410.7827, 2014.
- A framework for evaluating approximation methods for gaussian process regression. Journal of Machine Learning Research, 14:333–350, 2013.
- Healing products of Gaussian process experts. In H. D. III and A. Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 2068–2077. PMLR, 2020-13.
- Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets. Journal of the American Statistical Association, 111(514):800–812, 2016.
- On nearest-neighbor Gaussian process models for massive spatial data. Wiley Interdisciplinary Reviews: Computational Statistics, 8(5):162–171, 2016.
- M. Deisenroth and J. W. Ng. Distributed gaussian processes. In International Conference on Machine Learning, pages 1481–1490. PMLR, 2015.
- Efficient algorithms for bayesian nearest neighbor gaussian processes. Journal of Computational and Graphical Statistics, 28(2):401–414, 2019. PMID: 31543693.
- An Algorithm for Finding Best Matches in Logarithmic Expected Time. ACM Transactions on Mathematical Software (TOMS), 3:209–226, 1977.
- Gpytorch: Blackbox matrix-matrix gaussian process inference with gpu acceleration. Advances in neural information processing systems, 31, 2018.
- Product kernel interpolation for scalable gaussian processes. International Conference on Artificial Intelligence and Statistics, AISTATS 2018, 84:1407–1416, 2018.
- A Distribution-Free Theory of Nonparametric Regression. Springer Series in Statistics, 2010.
- On random subsampling of gaussian process regression: A graphon-based analysis. In International Conference on Artificial Intelligence and Statistics, pages 2055–2065. PMLR, 2020.
- Gaussian processes for big data. arXiv preprint arXiv:1309.6835, 2013.
- G. E. Hinton. Training products of experts by minimizing contrastive divergence. Neural computation, 14(8):1771–1800, 2002.
- Parametric Gaussian process regressors. In H. D. III and A. Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 4702–4712. PMLR, 13–18 Jul 2020.
- V. Lalchand and C. E. Rasmussen. Approximate Inference for Fully Bayesian Gaussian Process Regression, Apr. 2020. arXiv:1912.13440 [cs, stat].
- Generalized robust bayesian committee machine for large-scale gaussian process regression. In International Conference on Machine Learning, pages 3131–3140. PMLR, 2018.
- When gaussian process meets big data: A review of scalable gps. IEEE transactions on neural networks and learning systems, 31(11):4405–4423, 2020.
- J. E. Oakley and A. O’Hagan. Probabilistic sensitivity analysis of complex models: A bayesian approach. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 66(3):751–769, 2004.
- S. Omohundro. Five Balltree Construction Algorithms. Technical report, International Computer Science Institute, 1989.
- Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
- B. Schölkopf. The Kernel Trick for Distances. In T. Leen, T. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems, volume 13. MIT Press, 2000.
- Distribution calibration for regression. 36th International Conference on Machine Learning, ICML 2019, 2019-June:10347–10356, 2019.
- M. L. Stein. The screening effect in kriging. Annals of Statistics, 30(1):298–323, 2002.
- M. L. Stein. 2010 Rietz lecture when does the screening effect hold? Annals of Statistics, 39(6):2795–2819, 2011. arXiv: 1203.1801v1.
- M. L. Stein. Limitations on low rank approximations for covariance matrices of spatial data. Spatial Statistics, 8:1–19, 2014. Spatial Statistics Miami.
- Approximating likelihoods for large spatial data sets. Journal of the Royal Statistical Society. Series B: Statistical Methodology, 66(2):275–296, 2004.
- Provably reliable large-scale sampling from gaussian processes. arXiv preprint arXiv:2211.08036, 2022.
- S. Surjanovic and D. Bingham. Virtual library of simulation experiments: Test functions and datasets. Retrieved May 5, 2023, from http://www.sfu.ca/~ssurjano.
- M. Titsias. Variational learning of inducing variables in sparse gaussian processes. In Artificial intelligence and statistics, pages 567–574. PMLR, 2009.
- Sparse within sparse gaussian processes using neighbor information. In M. Meila and T. Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 10369–10378. PMLR, 2021-18.
- V. Tresp. A bayesian committee machine. Neural computation, 12(11):2719–2741, 2000.
- A. V. Vecchia. Estimation and Model Identification for Continuous Spatial Processes. Journal of the Royal Statistical Society. Series B (Methodological), 50(2):297–312, 1988. Publisher: [Royal Statistical Society, Wiley].
- Exact gaussian processes on a million data points. Advances in Neural Information Processing Systems, 32, 2019.
- Gaussian processes for machine learning, volume 2. MIT press Cambridge, MA, 2006.
- A. G. Wilson and H. Nickisch. Kernel interpolation for scalable structured Gaussian processes (KISS-GP). 32nd International Conference on Machine Learning, ICML 2015, 3:1775–1784, 2015.
- Variational Nearest Neighbor Gaussian Process. Proceedings of the 39th International Conference on Machine Learning, 2022.
- Faster kernel interpolation for gaussian processes. In A. Banerjee and K. Fukumizu, editors, Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, volume 130 of Proceedings of Machine Learning Research, pages 2971–2979. PMLR, 13–15 Apr 2021.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.