Bayesian spatial functional data clustering: applications in disease surveillance

Published 17 Jul 2024 in stat.ME | (2407.12633v1)

Abstract: Our method extends the application of random spanning trees to cases where the response variable belongs to the exponential family, making it suitable for a wide range of real-world scenarios, including non-Gaussian likelihoods. The proposed model addresses the limitations of previous spatial clustering methods by allowing all within-cluster model parameters to be cluster-specific, thus offering greater flexibility. Additionally, we propose a Bayesian inference algorithm that overcomes the computational challenges associated with the reversible jump Markov chain Monte Carlo (RJ-MCMC) algorithm by employing composition sampling and the integrated nested Laplace approximation (INLA) to compute the marginal distribution necessary for the acceptance probability. This enhancement improves the mixing and feasibility of Bayesian inference for complex models. We demonstrate the effectiveness of our approach through simulation studies and apply it to real-world disease mapping applications: COVID-19 in the United States of America, and dengue fever in the states of Minas Gerais and S~ao Paulo, Brazil. Our results highlight the model's capability to uncover meaningful spatial patterns and temporal dynamics in disease outbreaks, providing valuable insights for public health decision-making and resource allocation.

Abstract PDF HTML Upgrade to Chat

Summary

The paper introduces a novel Bayesian spatial functional clustering model using random spanning trees and latent Gaussian models capable of handling non-Gaussian disease data.
Simulation studies validated the model's ability to recover true clusters, and applications to COVID-19 and dengue data successfully identified spatially coherent disease risk patterns.
This method provides public health officials with actionable insights into disease spread by clustering regions with similar temporal risk patterns, aiding targeted interventions and resource allocation.

Bayesian Spatial Functional Data Clustering: Applications in Disease Surveillance

The paper introduces a novel approach to spatial functional data clustering, specifically targeting applications in disease surveillance. The method leverages Bayesian principles to generate spatial clusters that capture the evolving patterns of disease risk across geographically contiguous regions. By employing random spanning trees and latent Gaussian models, the approach is both flexible and capable of adapting to real-world scenarios characterized by non-Gaussian data distributions. This contributes a significant advancement in the methodological framework available for public health decision-making.

Methodological Framework

The crux of the approach lies in the utilization of a Bayesian spatial functional clustering model. This model integrates two core components: the spanning tree partition model for determining cluster memberships, and the latent Gaussian model, which characterizes the within-cluster data structure. The model's novelty is its capacity to handle response variables from the exponential family, thereby broadening its application to a variety of disease metrics.

Random spanning trees facilitate the segmentation of the study region into clusters. The model accommodates spatially contiguous units to be grouped based on similar temporal evolution of disease risk—incorporating trends, seasonality, and autoregressive behaviors within clusters, all while allowing for cluster-specific parameters. This flexibility marks a departure from traditional methods, which often rely on fixed parameter estimates that do not account for the localized variations in disease progression.

The Bayesian inference utilizing this model incorporates a modified reversible jump Markov chain Monte Carlo (RJ-MCMC) algorithm. The innovation here is the deployment of composition sampling coupled with the integrated nested Laplace approximation (INLA). This affords computational efficiency and accuracy in determining marginal distributions crucial for the Bayesian inference, mitigating the common computational challenges faced with complex models.

Results and Implications

Simulation studies demonstrated the model’s efficacy in uncovering true cluster structures under various scenarios, including those with nonlinear latent functions or increased data complexity such as Poisson-distributed responses. These simulations elucidate the model’s robustness and ability to recover spatial clusters reflective of underlying health risk patterns.

Applications of the model to real-world data sets, including COVID-19 case data for U.S. states and dengue fever incidences in Brazilian regions, underscore its utility. The model successfully identified spatial clusters with coherent temporal dynamics, offering insights into the geographic spread and intensity of disease outbreaks. The clustering of similar temporal risk patterns enables health officials to devise targeted interventions, ensuring optimized allocation of resources.

In practice, these methods provide public health decision-makers with actionable insights into disease spread and trends, facilitating proactive rather than reactive measures. Importantly, the methodological advancements allow for consideration of various potential non-Gaussian likelihoods, so public health analyses need not be constrained by assumptions of normality.

Future Directions

The proposed model opens numerous avenues for future research. Refining the techniques within adaptive sampling frameworks could improve computational efficiency and further enhance the algorithm's scalability and application breadth. Moreover, extending this model to account for covariate-driven spatial dependencies represents a worthwhile endeavor, particularly in addressing complex stochastic processes prevalent in spatial epidemiology.

The methodology outlined in this paper also invites integration with hybrid models that combine mechanistic approaches, potentially enhancing predictions in regions with sparse data or in the context of emerging infectious diseases where data paucity is an issue.

In summary, this paper contributes a rigorous and versatile framework for spatial clustering of functional data in disease surveillance, highlighting the expanding role of Bayesian methods in epidemiological modeling and public health.