Papers
Topics
Authors
Recent
Search
2000 character limit reached

GPU-Accelerated Vecchia Approximations of Gaussian Processes for Geospatial Data using Batched Matrix Computations

Published 12 Mar 2024 in stat.CO and cs.DC | (2403.07412v3)

Abstract: Gaussian processes (GPs) are commonly used for geospatial analysis, but they suffer from high computational complexity when dealing with massive data. For instance, the log-likelihood function required in estimating the statistical model parameters for geospatial data is a computationally intensive procedure that involves computing the inverse of a covariance matrix with size n X n, where n represents the number of geographical locations. As a result, in the literature, studies have shifted towards approximation methods to handle larger values of n effectively while maintaining high accuracy. These methods encompass a range of techniques, including low-rank and sparse approximations. Vecchia approximation is one of the most promising methods to speed up evaluating the log-likelihood function. This study presents a parallel implementation of the Vecchia approximation, utilizing batched matrix computations on contemporary GPUs. The proposed implementation relies on batched linear algebra routines to efficiently execute individual conditional distributions in the Vecchia algorithm. We rely on the KBLAS linear algebra library to perform batched linear algebra operations, reducing the time to solution compared to the state-of-the-art parallel implementation of the likelihood estimation operation in the ExaGeoStat software by up to 700X, 833X, 1380X on 32GB GV100, 80GB A100, and 80GB H100 GPUs, respectively. We also successfully manage larger problem sizes on a single NVIDIA GPU, accommodating up to 1M locations with 80GB A100 and H100 GPUs while maintaining the necessary application accuracy. We further assess the accuracy performance of the implemented algorithm, identifying the optimal settings for the Vecchia approximation algorithm to preserve accuracy on two real geospatial datasets: soil moisture data in the Mississippi Basin area and wind speed data in the Middle East.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. R. Furrer, M. G. Genton, and D. Nychka, “Covariance tapering for interpolation of large spatial datasets,” Journal of Computational and Graphical Statistics, vol. 15, no. 3, pp. 502–523, 2006.
  2. C. G. Kaufman, M. J. Schervish, and D. W. Nychka, “Covariance tapering for likelihood-based estimation in large spatial data sets,” Journal of the American Statistical Association, vol. 103, no. 484, pp. 1545–1555, 2008.
  3. M. Bevilacqua, A. Fassò, C. Gaetan, E. Porcu, and D. Velandia, “Covariance tapering for multivariate Gaussian random fields estimation,” Statistical Methods & Applications, vol. 25, pp. 21–37, 2016.
  4. D. Nychka, S. Bandyopadhyay, D. Hammerling, F. Lindgren, and S. Sain, “A multiresolution Gaussian process model for the analysis of large spatial datasets,” Journal of Computational and Graphical Statistics, vol. 24, no. 2, pp. 579–599, 2015.
  5. S. Abdulah, H. Ltaief, Y. Sun, M. G. Genton, and D. E. Keyes, “ExaGeoStat: A high performance unified software for geostatistics on manycore systems,” IEEE Transactions on Parallel and Distributed Systems, vol. 29, no. 12, pp. 2771–2784, 2018.
  6. Abdulah, Sameh and Ltaief, Hatem and Sun, Ying and Genton, Marc G and Keyes, David E, “Geostatistical modeling and prediction using mixed precision tile cholesky factorization,” in 2019 IEEE 26th international conference on high performance computing, data, and analytics (HiPC).   IEEE, 2019, pp. 152–162.
  7. S. Abdulah, Q. Cao, Y. Pei, G. Bosilca, J. Dongarra, M. G. Genton, D. E. Keyes, H. Ltaief, and Y. Sun, “Accelerating geostatistical modeling and prediction with mixed-precision computations: A high-productivity approach with parsec,” IEEE Transactions on Parallel and Distributed Systems, vol. 33, no. 4, pp. 964–976, 2021.
  8. Q. Cao, S. Abdulah, R. Alomairy, Y. Pei, P. Nag, G. Bosilca, J. Dongarra, M. G. Genton, D. E. Keyes, H. Ltaief et al., “Reshaping geostatistical modeling and prediction for extreme-scale environmental applications,” in 2022 SC22: International Conference for High Performance Computing, Networking, Storage and Analysis (SC).   IEEE Computer Society, 2022, pp. 13–24.
  9. M. Katzfuss and N. Cressie, “Spatio-temporal smoothing and em estimation for massive remote-sensing data sets,” Journal of Time Series Analysis, vol. 32, no. 4, pp. 430–446, 2011.
  10. H. Huang and Y. Sun, “Hierarchical low rank approximation of likelihoods for large spatial datasets,” Journal of Computational and Graphical Statistics, vol. 27, no. 1, pp. 110–118, 2018.
  11. S. Abdulah, H. Ltaief, Y. Sun, M. G. Genton, and D. E. Keyes, “Parallel approximation of the maximum likelihood estimation for the prediction of large-scale geostatistics simulations,” in 2018 IEEE international conference on cluster computing (CLUSTER).   IEEE, 2018, pp. 98–108.
  12. S. Mondal, S. Abdulah, H. Ltaief, Y. Sun, M. G. Genton, and D. E. Keyes, “Parallel approximations of the tukey g-and-h likelihoods and predictions for non-Gaussian geostatistics,” in 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS).   IEEE, 2022, pp. 379–389.
  13. A. V. Vecchia, “Estimation and model identification for continuous spatial processes,” Journal of the Royal Statistical Society Series B: Statistical Methodology, vol. 50, no. 2, pp. 297–312, 1988.
  14. M. Katzfuss and J. Guinness, “A general framework for Vecchia approximations of Gaussian processes,” 2021.
  15. M. Katzfuss, J. Guinness, and E. Lawrence, “Scaled Vecchia approximation for fast computer-model emulation,” SIAM/ASA Journal on Uncertainty Quantification, vol. 10, no. 2, pp. 537–554, 2022.
  16. J. Zhang and M. Katzfuss, “Multi-scale Vecchia approximations of Gaussian processes,” Journal of Agricultural, Biological and Environmental Statistics, vol. 27, no. 3, pp. 440–460, 2022.
  17. F. Jimenez and M. Katzfuss, “Scalable Bayesian optimization using vecchia approximations of Gaussian processes,” in International Conference on Artificial Intelligence and Statistics.   PMLR, 2023, pp. 1492–1512.
  18. “The Top 500 List,” https://top500.org/, [Online; accessed 28-November-2023].
  19. A. Haidar, T. Dong, P. Luszczek, S. Tomov, and J. Dongarra, “Batched matrix computations on hardware accelerators based on GPUs,” The International Journal of High Performance Computing Applications, vol. 29, no. 2, pp. 193–208, 2015.
  20. A. Abdelfattah, S. Tomov, and J. Dongarra, “Fast batched matrix multiplication for small sizes using half-precision arithmetic on GPUs,” in 2019 IEEE international parallel and distributed processing symposium (IPDPS).   IEEE, 2019, pp. 111–122.
  21. J. Guinness, “Permutation and grouping methods for sharpening Gaussian process approximations,” Technometrics, vol. 60, no. 4, pp. 415–429, 2018.
  22. Guinness, Joseph, “Gaussian process learning via Fisher scoring of Vecchia’s approximation,” Statistics and Computing, vol. 31, no. 3, p. 25, 2021.
  23. H. Huang, S. Abdulah, Y. Sun, H. Ltaief, D. E. Keyes, and M. G. Genton, “Competition on spatial statistics for large datasets,” Journal of Agricultural, Biological and Environmental Statistics, vol. 26, pp. 580–595, 2021.
  24. S. Abdulah, F. Alamri, P. Nag, Y. Sun, H. Ltaief, D. E. Keyes, and M. G. Genton, “The second competition on spatial statistics for large datasets,” arXiv preprint arXiv:2211.03119, 2022.
  25. Y. Hong, Y. Song, S. Abdulah, Y. Sun, H. Ltaief, D. E. Keyes, and M. G. Genton, “The third competition on spatial statistics for large datasets,” Journal of Agricultural, Biological and Environmental Statistics, pp. 1–18, 2023.
  26. R. Huser, M. L. Stein, and P. Zhong, “Vecchia likelihood approximation for accurate and fast inference in intractable spatial extremes models,” arXiv preprint arXiv:2203.05626, 2022.
  27. Q. Vu, A. Zammit-Mangion, and S. J. Chuter, “Constructing large nonstationary spatio-temporal covariance models via compositional warpings,” Spatial Statistics, vol. 54, p. 100742, 2023.
  28. J. Zhang, S. You, and L. Gruenwald, “Large-scale spatial data processing on GPUs and GPU-accelerated clusters,” Sigspatial Special, vol. 6, no. 3, pp. 27–34, 2015.
  29. X. Li, T. Huang, D.-T. Lu, and C. Niu, “Accelerating experimental high-order spatial statistics calculations using GPUs,” Computers & Geosciences, vol. 70, pp. 128–137, 2014.
  30. J. Zhang, S. You, and L. Gruenwald, “Efficient parallel zonal statistics on large-scale global biodiversity data on GPUs,” in Proceedings of the 4th International ACM SIGSPATIAL Workshop on Analytics for Big Geospatial Data, 2015, pp. 35–44.
  31. G. Zhang, A.-X. Zhu, and Q. Huang, “A GPU-accelerated adaptive kernel density estimation approach for efficient point pattern analysis on spatial big data,” International Journal of Geographical Information Science, vol. 31, no. 10, pp. 2068–2097, 2017.
  32. S. K. Prasad, M. McDermott, S. Puri, D. Shah, D. Aghajarian, S. Shekhar, and X. Zhou, “A vision for GPU-accelerated parallel computation on geo-spatial datasets,” SIGSPATIAL Special, vol. 6, no. 3, pp. 19–26, 2015.
  33. K. Wang, S. Abdulah, Y. Sun, and M. G. Genton, “Which parameterization of the Matérn covariance function?” Spatial Statistics, vol. 58, p. 100787, 2023.
  34. J. Duchi, “Derivations for linear algebra and optimization,” Berkeley, California, vol. 3, no. 1, pp. 2325–5870, 2007.
  35. M. Gates, J. Kurzak, A. Charara, A. YarKhan, and J. Dongarra, “SLATE: Design of a modern distributed and accelerated linear algebra library,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2019, pp. 1–18.
  36. J. Dongarra, M. Gates, A. Haidar, J. Kurzak, P. Luszczek, P. Wu, I. Yamazaki, A. YarKhan, M. Abalenkovs, N. Bagherpour et al., “PLASMA: Parallel linear algebra software for multicore using openmp,” ACM Transactions on Mathematical Software (TOMS), vol. 45, no. 2, pp. 1–35, 2019.
  37. S. Abdulah, K. Akbudak, W. Boukaram, A. Charara, D. Keyes, H. Ltaief, A. Mikhalev, D. Sukkari, and G. Turkiyyah, “Hierarchical computations on manycore architectures (hicma),” See http://github. com/ecrc/hicma, 2019.
  38. S. Zampini, W. Boukaram, G. Turkiyyah, O. Knio, and D. Keyes, “H2opus: a distributed-memory multi-GPU software package for non-local operators,” Advances in Computational Mathematics, vol. 48, no. 3, p. 31, 2022.
  39. E. Agullo, J. Demmel, J. Dongarra, B. Hadri, J. Kurzak, J. Langou, H. Ltaief, P. Luszczek, and S. Tomov, “Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects,” in Journal of Physics: Conference Series, vol. 180, no. 1.   IOP Publishing, 2009, p. 012037.
  40. A. Abdelfattah, D. Keyes, and H. Ltaief, “KBLAS: An optimized library for dense matrix-vector multiplication on GPU accelerators,” ACM Transactions on Mathematical Software (TOMS), vol. 42, no. 3, pp. 1–31, 2016.
  41. F. G. Van Zee and R. A. Van De Geijn, “BLIS: A framework for rapidly instantiating BLAS functionality,” ACM Transactions on Mathematical Software (TOMS), vol. 41, no. 3, pp. 1–33, 2015.
  42. T. Dong, A. Haidar, P. Luszczek, S. Tomov, A. Abdelfattah, and J. Dongarra, “MAGMA batched: A batched BLAS approach for small matrix factorizations and applications on GPUs,” Technical Report. Technical report, Tech. Rep., 2016.
  43. J. Dongarra, S. Hammarling, N. J. Higham, S. D. Relton, P. Valero-Lara, and M. Zounon, “The design and performance of batched BLAS on modern high-performance computing systems,” Procedia Computer Science, vol. 108, pp. 495–504, 2017.
  44. K. Akbudak, H. Ltaief, A. Mikhalev, and D. Keyes, “Tile low rank cholesky factorization for climate/weather modeling applications on manycore architectures,” in International Conference on High Performance Computing.   Springer, 2017, pp. 22–40.
  45. Z. Geng, S. Abdulah, H. Ltaief, Y. Sun, M. G. Genton, and D. E. Keyes, “GPU-accelerated dense covariance matrix generation for spatial statistics applications,” 2023.
  46. N. W. Chaney, P. Metcalfe, and E. F. Wood, “HydroBlocks: A field-scale resolving land surface model for application over continental extents,” Hydrological Processes, vol. 30, no. 20, pp. 3543–3559, 2016.
  47. J. Powers, X.-Y. Huang, B. Klemp, C. Skamarock, J. Dudhia, and O. Gill, “A description of the advanced research WRF version 2,” NCAR tech, vol. 15, 2008.
Citations (2)

Summary

  • The paper introduces a GPU-accelerated Vecchia approximation for Gaussian processes that achieves up to 1380X speedups while maintaining accuracy comparable to exact MLE.
  • It leverages batched matrix computations and the KBLAS library to efficiently handle large-scale geospatial datasets on NVIDIA GPUs such as GV100, A100, and H100.
  • The study evaluates different reordering strategies and demonstrates improved memory efficiency and scalability for modeling real-world climate data like soil moisture and wind speed.

GPU-Accelerated Gaussian Process Approximation via Batched Vecchia

This paper introduces a GPU-accelerated implementation of the Vecchia approximation for Gaussian processes (GPs), leveraging batched matrix computations to enhance computational efficiency in geospatial data analysis. The implementation targets the efficient estimation of statistical model parameters for large-scale climate and weather applications. By utilizing the KBLAS library and batched linear algebra operations, the authors demonstrate significant speedups on NVIDIA GPU architectures, including GV100, A100, and H100, while maintaining accuracy comparable to exact maximum likelihood estimation (MLE).

Key Contributions

The paper makes several notable contributions to the field of computational statistics and high-performance computing:

  • A GPU-accelerated implementation of the Vecchia approximation algorithm is presented, designed for efficient parameter estimation in climate and weather applications.
  • The use of the KBLAS library and batched linear algebra operations accelerates the implementation on modern NVIDIA GPUs, including GV100, A100, and H100.
  • The accuracy of the proposed implementation is assessed through numerical studies and real-world datasets, including soil moisture data from the Mississippi Basin and wind speed data from the Middle East. The study identifies optimal settings for the Vecchia algorithm to achieve performance comparable to exact MLE as implemented in the ExaGeoStat software.
  • Performance evaluations on NVIDIA GPUs show significant speedups, reaching up to 700X on GV100, 833X on A100, and 1380X on H100, compared to exact MLE.
  • The implementation accommodates larger problem sizes within the same GPU memory, enabling improved modeling for high-resolution geospatial data.

Batched Vecchia Approximation Framework

The proposed framework leverages batched matrix operations to accelerate the Vecchia approximation algorithm for Gaussian fields. The method involves reordering the location set and selecting the nearest neighbors for each location. The batched Vecchia approximated likelihood algorithm is described in detail (Figure 1), which involves replacing the high-dimensional joint distribution of the GP with a product of univariate conditional distributions: Figure 1

Figure 1: Batched Vecchia algorithm description. Σm:n\bm \Sigma_{m:n} are constructed by the nearest neighbors of ym:nτ\mathbf y^{\tau}_{m:n}. The batched POTRF routine is applied to these matrices. After this decomposition, the resulting outputs are utilized as inputs for the batched TRSV operation with vm:n\mathbf v_{m:n} and $\mathbf y^{\tau}_{\mathbf J_{m:n}$, separately.

For each spatial location, the algorithm computes a covariance matrix of its nearest neighbors and a cross-covariance vector between the location and its nearest neighbors. The approach applies uniform operations to batches of small matrices to leverage the underlying GPU accelerators. The implementation details include the use of a batched CUDA kernel for covariance matrix generation, batched Cholesky decomposition (POTRF), and triangular linear solver (TRSV) routines from the KBLAS library.

Reordering Strategies

The paper explores the impact of different reordering strategies on the accuracy of the log-likelihood approximation. The choice of reordering method is crucial for selecting nearest-neighbor points for each location. The study compares random ordering and Morton ordering, highlighting that random ordering generally outperforms Morton ordering for large-scale problems (Figure 2). Figure 2

Figure 2

Figure 2

Figure 2

Figure 2: The example of random and Morton ordering on locations 20×2020 \times 20. (First row) The 45th and 250th locations (red stars) in the random ordering are marked with their nearest neighbors (orange circle); (Second row) The 45th and 250th locations (red stars) in the Morton order algorithm are marked with their nearest neighbors (orange circle). Blue circles indicate past locations for a given ordering algorithm.

Random ordering retains nearest neighbors immediately surrounding a target location, whereas Morton ordering may initially sacrifice proximity accuracy for earlier-ordered locations.

Performance and Accuracy Assessment

The paper includes a comprehensive performance and accuracy assessment of the batched Vecchia algorithm. Numerical studies using Kullback-Leibler (KL) divergence compare the approximation to exact MLE. The results indicate that the complexity of the approximation increases with the range or smoothness parameters of the Matérn kernel. Also, as the problem size increases, more conditioning points are required to maintain approximation accuracy.

Real-world datasets, including soil moisture and wind speed data, are used to evaluate the accuracy of the batched Vecchia algorithm in modeling and prediction tasks. The estimated parameter vectors closely align with those obtained via ExaGeoStat (exact MLE), particularly as the number of conditioning neighbors increases (Figure 3). Figure 3

Figure 3

Figure 3

Figure 3

Figure 3

Figure 3

Figure 3: The estimated parameter vectors using Vecchia approximation with different conditioning sizes compared to {\it ExaGeoStat} (exact MLE). The first row is the parameter vector for soil moisture, and the second for wind speed.

The performance evaluation demonstrates significant speedups on NVIDIA GPUs, with the Vecchia method outperforming ExaGeoStat-GPU in single likelihood estimation time. The batched Vecchia approximation can handle larger problem sizes, up to 1 million locations, on a single GPU.

Memory Footprint and Arithmetic Complexity

The paper analyzes the memory footprint and arithmetic complexity of the batched Vecchia implementation compared to exact MLE (Figure 4). Figure 4

Figure 4

Figure 4: Comparison of Arithmetic complexity: Vecchia algorithm versus Exact MLE.

The Vecchia algorithm significantly reduces memory requirements and computational complexity, making it suitable for large-scale geospatial data analysis.

Conclusions and Future Directions

The paper demonstrates the effectiveness of a GPU-accelerated Vecchia approximation for Gaussian processes, achieving significant speedups while maintaining accuracy. The use of batched matrix computations and the KBLAS library enables efficient parameter estimation for large-scale geospatial datasets. The study identifies optimal settings for the Vecchia algorithm and showcases its applicability to real-world problems in climate and weather modeling. This work paves the way for future research in scalable Gaussian process approximations and their application to various scientific domains. The source code for this work is publicly available.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We found no open problems mentioned in this paper.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 6 tweets with 2 likes about this paper.