Parallel inference for massive distributed spatial data using low-rank models
Abstract: Due to rapid data growth, statistical analysis of massive datasets often has to be carried out in a distributed fashion, either because several datasets stored in separate physical locations are all relevant to a given problem, or simply to achieve faster (parallel) computation through a divide-and-conquer scheme. In both cases, the challenge is to obtain valid inference that does not require processing all data at a single central computing node. We show that for a very widely used class of spatial low-rank models, which can be written as a linear combination of spatial basis functions plus a fine-scale-variation component, parallel spatial inference and prediction for massive distributed data can be carried out exactly, meaning that the results are the same as for a traditional, non-distributed analysis. The communication cost of our distributed algorithms does not depend on the number of data points. After extending our results to the spatio-temporal case, we illustrate our methodology by carrying out distributed spatio-temporal particle filtering inference on total precipitable water measured by three different satellite sensor systems.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.