SpatialPlus: Unbiased Estimation in Spatial Models
- SpatialPlus is a set of methodologies that mitigate spatial confounding by decorrelating covariates from smooth spatial random effects.
- The frequentist approach employs a two-stage residualization with thin-plate splines to remove first-order smoothing-induced bias in fixed-effect estimates.
- Advanced extensions include Bayesian hierarchical models and spectral generalizations that propagate uncertainty and improve inference in areal data.
SpatialPlus (“Spatial+”) comprises a set of statistical methodologies devised to mitigate bias arising from spatial confounding in spatial regression models. In these models, collinearity between spatially indexed covariates and spatial random effects often introduces significant bias in fixed-effect estimates, particularly when both are smooth functions of space. Spatial+ achieves bias reduction by actively decorrelating covariates from the spatial structure, thereby isolating fixed-effect inference from the influence of spatial smoothing. The core concept originated with frequentist two-stage adjustment using thin-plate splines, has been extended to Bayesian hierarchical modeling with smoothness parameter coupling, and has inspired spectral domain and areal data generalizations.
1. Spatial Confounding and Smoothing-Induced Bias
Spatial confounding occurs when observed covariates used in regression share similar spatial patterns with random effects or smoothers included to account for spatial residuals. In the canonical model,
and may be approximately collinear, allowing the smoother to absorb variation attributable to . As articulated by Rice (1986), penalized smoothing (e.g., with thin-plate splines) introduces a smoothing-induced bias in , which can decay at the same or a slower rate than the estimator's standard error, so that smoothing-induced bias can dominate in finite samples (Dupont et al., 2020).
Generalized spatial regression formulations—including those using partial splines, GMRF/ICAR random effects, or multivariate areal models—are equally susceptible. The identifiability of is compromised by shared large-scale and/or high-frequency spatial features between covariates and random effects, a phenomenon not remedied merely by increased sample size.
2. The Spatial+ Methodology: Frequentist Formulation
The Spatial+ approach, as introduced by Dupont, Wood, and Augustin, involves a two-stage residualization procedure that decouples the covariate from the spatial pattern captured by the smoother. The prototypical procedure is:
- For each covariate , fit the spatial model using thin-plate splines and extract the residuals .
- Fit the primary regression model using in place of : Bias analysis shows that the leading-order smoothing-induced bias in disappears to first order, resulting in an estimator that is asymptotically unbiased: $\hat\beta^+ - \beta = o(n^{-1/2}) + O\left((\lambda \lambda_x)^{1/2}\right), \quad \text{with } \Var(\hat\beta^+) \sim \frac{\sigma^2}{n\sigma_x^2}$ When tuning parameters and are chosen optimally, the bias decays faster than sampling variability, unlike in the unadjusted model (Dupont et al., 2020).
This residualization ensures that, under the natural spline basis, is (approximately) orthogonal to , preventing the smoother from "stealing" covariate effects.
3. Extensions: Bayesian Spatial+, Joint Smoothing Priors, and Uncertainty Quantification
A major limitation of frequentist Spatial+ is the lack of propagation of uncertainty from covariate residualization to the primary response model. The Bayesian Spatial+ framework jointly models the covariate and response:
- Stage 1 (Covariate):
- Stage 2 (Response): where .
All model parameters—including smoothers' coefficients, smoothing penalties, and regression coefs—are sampled simultaneously in a hierarchical setting (typically via HMC/NUTS in RStan). This enables full posterior propagation of uncertainty, meaning that interval estimates for reflect all sources of variability, in contrast to the plug-in two-stage approach (Marques et al., 2023).
A novel feature of Bayesian Spatial+ is the joint prior on smoothing parameters, ensuring the response surface is smoother than the covariate surface : This restricts the response smoother from capturing high-frequency spatial features that are not present in the covariate residuals, directly mitigating spectral confounding and preventing overfitting in the second stage.
4. Spectral and Areal Data Generalizations
Areal models, especially in the context of multivariate counts, are prone to spatial confounding due to similar issues of collinearity between covariates and spatial random effects. The simplified spatial+ approach as in Urdangarin et al. (Urdangarin et al., 2023) operates via spectral decomposition:
- Decompose the areal adjacency/precision matrix into eigenvectors: .
- Covariate is split into large-scale () and short-scale () components, where holds eigenvectors corresponding to the smoothest spatial direction.
- The regression model uses only , thus orthogonalizing the fixed effect against the spatial random effect .
This method is particularly computationally efficient (implemented via INLA) and is adaptable to multivariate, multi-covariate settings, albeit with the need to select the cutoff .
5. Empirical Performance and Simulation Studies
Simulation studies consistently demonstrate that:
- Standard spatial regression (uncorrected) yields substantial bias in under moderate to strong confounding.
- Both frequentist and Bayesian Spatial+ methods substantially reduce bias; the Bayesian variant achieves near-nominal coverage for and better MSE, especially under moderate or large sample sizes (Marques et al., 2023, Dupont et al., 2020, Urdangarin et al., 2023).
- When confounding arises from small-scale spatial structure, all methods suffer nonvanishing bias, though Bayesian Spatial+ maintains superior mean squared error properties.
The table below (Scenario 1 from (Urdangarin et al., 2023)) illustrates the gain:
| Model | True | Mean Est. | Bias | 95% Coverage |
|---|---|---|---|---|
| M–Spatial | -0.15 | -0.2996 | -0.1496 | 35.7% |
| M–SpatPlus(10) | -0.15 | -0.1863 | -0.0363 | 95.3% |
In real-world applications (e.g., German precipitation and Indian district-level crime counts), Spatial+ adjustments yield enlarged effect estimates and wider, more reliable credible intervals, preventing overconfident inference and improving interpretability (Marques et al., 2023, Urdangarin et al., 2023).
6. Implementation and Practical Considerations
Spatial+ is readily implementable with standard spline-based or areal spatial modeling software. For frequentist estimation, software such as mgcv enables specification of sequential residualization via thin-plate splines. For Bayesian inference, RStan with HMC/NUTS is used, with diagnostic checks on and effective sample size (Marques et al., 2023). INLA enables rapid inference for the simplified spatial+ model in areal and multivariate contexts.
The selection of the spectral cutoff in areal settings remains an open methodological question; practical heuristics suggest removing 7–20% of eigenvectors for smooth covariates, up to 40% for mixed spatial scales. Posterior standard errors can be inflated, and model selection metrics (e.g., DIC, WAIC) may inadequately distinguish among plausible values of .
Spatial+ is directly extensible to non-Gaussian responses and to a wide range of spatial modeling frameworks, provided residualization is performed using the corresponding smoothing/quadratic penalty defined by the prior or spatial effect (Dupont et al., 2020, Urdangarin et al., 2023).
7. Limitations and Extensions
Current limitations include the restriction of original Spatial+ to linear covariate effects, the need for separate residualization for each linear term, and incomplete theoretical guarantees for high-dimensional or nonlinear covariate scenarios. Automated or cross-validated selection of spectral cutoffs in areal implementations, as well as variance component inference and improved standard error calibration, are active areas of research (Urdangarin et al., 2023). Extensions to allow joint decorrelation of multiple covariates and to handle nonlinear/additive smooth effects remain open.
Spatial+ and its modern Bayesian generalizations provide a principled solution to the pervasive problem of spatial confounding, yielding unbiased or nearly-unbiased effect estimates, valid uncertainty quantification, and better fixed-effect inference in spatial regression and areal models (Dupont et al., 2020, Marques et al., 2023, Urdangarin et al., 2023).