Bivariate Spatial Clustering (LISA)
- Bivariate spatial clustering analysis is a framework that examines local spatial autocorrelation and cross-variable associations using LISA metrics.
- It integrates methods such as Bayesian hierarchical models, CAR mixtures, and adaptive lasso to identify and quantify spatial clusters.
- Applications in epidemiology, socioeconomics, and environmental studies reveal local hotspots and underlying spatial patterns.
Bivariate spatial clustering analysis using Local Indicators of Spatial Association (LISA) constitutes a rigorous statistical framework for identifying and characterizing spatial clusters within two variables (typically paired continuous outcomes) across a spatial domain. This analytic class underpins a diversity of methodologies, including canonical LISA metrics (e.g., Local Moran's I, local Geary's C), conditional graphical models, clustering-enriched regression models, and spatially adaptive factor analysis. The interrelation between spatial autocorrelation—dependency among neighboring spatial units—and cross-variable (bivariate) association forms the methodological core of modern bivariate spatial clustering frameworks.
1. Conceptual Foundations of Bivariate Spatial Clustering and LISA
Bivariate spatial clustering analysis extends univariate LISA approaches to assess not only spatial autocorrelation within each variable, but also inherent or endemic associations between variables as they manifest over space. The principal objective is twofold:
- Determining the spatial architecture of each variable individually, identifying local clusters (“hotspots”) and spatial outliers.
- Quantifying (and where possible, disentangling) the local and global interaction of clustering between the two variables—whether strong spatial coincidence/correlation exists and in which locations.
Anselin’s foundational requirements for LISA are central: (a) each local indicator quantifies significant spatial clustering around a site; (b) the sum of all local indicators is proportional to the global spatial statistic, ensuring analytic coherence between local and global summaries (Chen, 2022). Extensions to bivariate settings necessitate additional considerations for modeling both spatial autocorrelation and cross-variable (between-variable) linkage, which may derive from common risk factors, shared environmental determinants, or other confounders.
2. Statistical Models for Bivariate Spatial Clustering
Bivariate spatial clustering can be formalized in a variety of model-based and nonparametric settings, including:
A. Bayesian Hierarchical Graphical Models
Recent developments include a bivariate directed acyclic graphical autoregression (BDAGAR) framework (Gao et al., 2019). Consider spatial observations of cancer incidence for disease at region : with denoting latent spatial effects (random effects capturing structured residual spatial dependence). For the two diseases, BDAGAR jointly models the random effects as follows: Here, are precision matrices encoding spatial dependencies (functional of spatial adjacency), and with the adjacency matrix, models cross-disease endemic and spatial association.
B. Conditional Autoregressive (CAR) Mixture and Partition Models
For spatio-temporal areal data, conditional autoregressive models are embedded in clustering via nonparametric priors (e.g., Dirichlet process mixtures), producing latent spatial partitions that correspond to clusters of units with similar regression (or autoregressive) parameters (Mozdzen et al., 2022). These clusters simultaneously capture local autocorrelation and the coherence of patterns across both variables.
C. Clustering in Coefficient Surfaces
Piecewise-constant clustering of regression coefficients using fused, network-induced lasso penalties (forest lasso) provides spatially contiguous grouping of locations according to covariate-outcome relationships (Zhang et al., 17 Apr 2024). Here, the spatial “clustering” arises from shared regression parameters within clusters, detected via adaptive penalization.
D. Factor Analysis-Driven Clustering
Spatially clustered factor analysis assigns multivariate spatial locations into clusters sharing similar local covariance structures, using penalized log-likelihoods that favor spatial smoothness or contiguity in latent structure (Jin et al., 11 Sep 2024). Such models generalize bivariate LISA to the multivariate dependency regime.
3. LISA Metrics: Definitions, Reconstruction, and Normalization
Anselin’s LISA metrics are local decompositions of global spatial autocorrelation statistics:
- Local Moran’s I at site :
where is a standardized variable, a (globally normalized, symmetric) spatial weight, and the sum of contiguity matrix elements (Chen, 2022).
- Local Geary’s C:
Canonical LISA forms (MI3 and GC3) restore validity to Anselin’s proportionality condition (sum-local = global) only if global normalization and correct standardization are enforced. Empirical studies confirm that canonical forms yield constant ratios with non-normalized indices and maintain strict adherence to the theoretical requirements, unlike commonly used row-normalized (asymmetric) versions, which break proportionality (Chen, 2022).
For bivariate analysis, LISA provides the framework for high-resolution spatial decomposition, supporting local interpretation and comparative analyses between spatial patterns of two variables. However, extensions to joint bivariate LISA are less trivial than the univariate case due to the need to account for both spatial and cross-variable dependencies.
4. Computational and Methodological Considerations
Bivariate spatial clustering analysis presents nontrivial computational challenges. Key considerations include:
- Precision Matrix Construction and Efficient Sampling: In hierarchical/directed graph models, the use of sparse precision matrices (e.g., in Gaussian Markov random field representations) enables scalable Gibbs or Metropolis–Hastings sampling via block tri-diagonal or banded matrix algorithms (Gao et al., 2019, Mozdzen et al., 2022).
- Normalization Protocols: Proper global normalization of weight matrices is essential, both for analytic integrity (i.e., for LISA proportionality conditions) and interpretability (boundedness between canonical values) (Chen, 2022).
- Clustering Procedures: Bayesian nonparametric approaches (e.g., Dirichlet processes) are favored for their ability to dynamically infer the number of clusters and to provide latent partitions, obviating ad hoc specification and enhancing statistical efficiency (Mozdzen et al., 2022, Jin et al., 11 Sep 2024).
- Spatial Penalty Design: Penalties such as the adaptive forest lasso optimize computational and statistical efficiency by reducing the number of pairwise comparisons in high-dimensional networks, enabling practical recovery of contiguous clusters even in complex domains (Zhang et al., 17 Apr 2024).
- Complementarity with Non-LISA Methods: RKAD (Ripley’s K for areal data) provides a spatial clustering metric insensitive to variable centroids, complementing LISA by directly quantifying areal unit clustering (especially for binary traits) (Self et al., 2022). This offers robustness in cases where conventional LISA approaches are prone to inflated type I error or power loss due to geometric distortion.
5. Applications and Empirical Results
Empirical applications illustrate the operational versatility and impact of bivariate spatial clustering frameworks:
- Epidemiology: Analysis of SEER Program data for esophagus and lung cancer in California using BDAGAR revealed strong endemic and spatial associations; nearly all counties showed positive inter-cancer correlation, with lung cancer displaying higher spatial autocorrelation (ρ ≈ 0.47 vs. 0.11) (Gao et al., 2019). Model comparison via WAIC favored the hierarchical DAG approach over multivariate CAR competitors.
- Socioeconomic Analysis: Bayesian spatio-temporal clustering of Italian unemployment rates yielded meaningful regional partitions, revealing both enduring (North–South) and previously unrecognized disparities in unemployment dynamics; clusters offered enhanced statistical precision and interpretability for policymakers (Mozdzen et al., 2022).
- Environmental Time Series: Model-based clustering of bivariate satellite time series through quantile regression identified homogeneous areas for trophic status in the Gulf of Gabes, exploiting the robustness of the asymmetric Laplace distribution to distributional asymmetry and temporal dependence (Musau et al., 2022).
- Robust Cluster Detection: Incorporation of spatially informed penalties (forest lasso) with bivariate spline estimation enabled accurate recovery of spatially contiguous clusters in covariate–outcome relationships, yielding clusters aligned with known regional patterns in satellite–ground temperature modeling (Zhang et al., 17 Apr 2024).
6. Advances: Outlier Detection, Influence, and the Limits of LISA
Traditional LISA metrics, while powerful for mapping local spatial autocorrelation, are susceptible to distortion by spatial outliers, which can mask key spatial relationships (Arbia et al., 23 Oct 2024). The introduction of the Local Influence Function (LIF) quantifies the influence of individual observations on global spatial indicators (e.g., Moran’s I), incorporating both the numerical extremity and local spatial topology. The LIF is derived as an integrated (absolute) influence over a contamination range, capturing how a particular observation distorts or drives global spatial statistics, beyond what LISA alone can diagnose. Empirical results show that LIF can detect influential sites that do not correspond to significant LISA clusters, providing an additional diagnostic critical for robust spatial analysis.
7. Synthesis, Limitations, and Future Directions
Bivariate spatial clustering analysis via LISA and allied frameworks has matured into a multifaceted methodological domain. Theoretical advances clarify the centrality of appropriate normalization and standardization, and models integrating spatial structure and bivariate association (e.g., via hierarchical graphical models) now provide state-of-the-art performance and interpretability. Limitations persist with respect to computational scaling in high-dimensional settings, sensitivity to model specification (e.g., in quantile combinations or penalty weighting), and in some cases, the treatment of irregular spatial domains or complex spatial networks. Future directions include further integration of robust diagnostics (like LIF), adaption to multivariate (beyond bivariate) settings (Jin et al., 11 Sep 2024), and harmonization of model-based and indicator-based analyses to provide comprehensive inference pipelines for complex spatial data.
Key Bivariate Spatial Clustering Framework | Main Technique | Canonical Reference(s) |
---|---|---|
BDAGAR graphical models | Bayesian hierarchical DAG priors for joint random effects | (Gao et al., 2019) |
Canonical LISA metrics | Globally normalized local Moran's I / Geary's C | (Chen, 2022) |
Forest lasso/triangulated spline regression | Penalized least squares with adaptive fusion penalty | (Zhang et al., 17 Apr 2024) |
RKAD | Generalized Ripley’s K for areal data | (Self et al., 2022) |
SCFA (Clustered Factor Analysis) | Penalized likelihood, iterative K-means for clusters | (Jin et al., 11 Sep 2024) |
The modern landscape of bivariate spatial clustering analysis is defined by the interplay between sound statistical grounding, computational tractability, and the capacity for local/global interpretability—key features that continue to drive methodological advances and impact in spatial epidemiology, environmental science, socioeconomics, and beyond.