Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
134 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Spatial Heterogeneous Additive Partial Linear Model: A Joint Approach of Bivariate Spline and Forest Lasso (2404.11579v2)

Published 17 Apr 2024 in stat.ME

Abstract: Identifying spatial heterogeneous patterns has attracted a surge of research interest in recent years, due to its important applications in various scientific and engineering fields. In practice the spatially heterogeneous components are often mixed with components which are spatially smooth, making the task of identifying the heterogeneous regions more challenging. In this paper, we develop an efficient clustering approach to identify the model heterogeneity of the spatial additive partial linear model. Specifically, we aim to detect the spatially contiguous clusters based on the regression coefficients while introducing a spatially varying intercept to deal with the smooth spatial effect. On the one hand, to approximate the spatial varying intercept, we use the method of bivariate spline over triangulation, which can effectively handle the data from a complex domain. On the other hand, a novel fusion penalty termed the forest lasso is proposed to reveal the spatial clustering pattern. Our proposed fusion penalty has advantages in both the estimation and computation efficiencies when dealing with large spatial data. Theoretically properties of our estimator are established, and simulation results show that our approach can achieve more accurate estimation with a limited computation cost compared with the existing approaches. To illustrate its practical use, we apply our approach to analyze the spatial pattern of the relationship between land surface temperature measured by satellites and air temperature measured by ground stations in the United States.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Statistics for high-dimensional data: methods, theory and applications. Springer Science & Business Media.
  2. Geographically weighted regression bandwidth selection and spatial autocorrelation: an empirical example using chinese agriculture data. Applied Economics Letters, 17(8):767–772.
  3. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association, 96(456):1348–1360.
  4. Geographically weighted regression: the analysis of spatially varying relationships. John Wiley & Sons.
  5. Spatial modeling with spatially varying coefficient processes. Journal of the American Statistical Association, 98(462):387–396.
  6. Network lasso: Clustering and optimization in large graphs. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pages 387–396.
  7. Bayesian spatial homogeneity pursuit of functional data: an application to the us income distribution. Bayesian Analysis, 18(2):579–605.
  8. Adaptive lasso for sparse high-dimensional regression models. Statistica Sinica, pages 1603–1618.
  9. Spline functions on triangulations. Number 110. Cambridge University Press.
  10. Bivariate penalized splines for regression. Statistica Sinica, pages 1399–1417.
  11. A model for spatio-temporal clustering using multinomial probit regression: application to avalanche counts. Environmetrics, 23(6):522–534.
  12. Cluster detection of spatial regression coefficients. Statistics in medicine, 36(7):1118–1133.
  13. Modeling crop phenology in the us corn belt using spatially referenced smos satellite data. Journal of Agricultural, Biological and Environmental Statistics, pages 1–19.
  14. Spatial homogeneity pursuit of regression coefficients for large datasets. Journal of the American Statistical Association, 114(527):1050–1062.
  15. Bayesian models for detecting difference boundaries in areal data. Statistica Sinica, 25(1):385.
  16. Additive partially linear models for ultra-high-dimensional regression. Stat, 8(1):e223.
  17. Sparse learning and structure identification for ultrahigh-dimensional image-on-scalar regression. Journal of the American Statistical Association, pages 1–15.
  18. Creating a seamless 1 km resolution daily land surface temperature dataset for urban and surrounding areas in the conterminous united states. Remote sensing of environment, 206:84–97.
  19. Developing a 1 km resolution daily air temperature dataset for urban and surrounding areas in the conterminous united states. Remote sensing of environment, 215:74–84.
  20. Subgroup analysis for heterogeneous additive partially linear models and its application to car sales data. Computational Statistics & Data Analysis, 138:239–259.
  21. A spatial constrained k-means approach to image segmentation. In Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint, volume 2, pages 738–742. IEEE.
  22. A concave pairwise fusion approach to subgroup analysis. Journal of the American Statistical Association, 112(517):410–423.
  23. Exploration of heterogeneous treatment effects via concave fusion. The International Journal of Biostatistics, 1(ahead-of-print).
  24. Global historical climatology network-daily (GHCN-Daily), version 3. NOAA National Climatic Data Center, 10:V5D21VHZ.
  25. An overview of the global historical climatology network-daily database. Journal of atmospheric and oceanic technology, 29(7):897–910.
  26. Cluster analysis: unsupervised learning via supervised learning with a non-convex penalty. The Journal of Machine Learning Research, 14(1):1865–1889.
  27. Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical association, 66(336):846–850.
  28. Neighborhood social capital and crime victimization: Comparison of spatial regression analysis and hierarchical regression analysis. Social science & medicine, 75(10):1895–1902.
  29. Fused lasso approach in regression coefficients clustering: learning parameter heterogeneity in data integration. The Journal of Machine Learning Research, 17(1):3915–3937.
  30. Spatial clustering in the presence of obstacles. In Proceedings 17th International Conference on Data Engineering, pages 359–367. IEEE.
  31. Shrinkage tuning parameter selection with a diverging number of parameters. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(3):671–683.
  32. Efficient estimation of partially linear models for data on complicated domains by bivariate penalized splines over triangulations. Statistica Sinica, 30(1):347–369.
  33. Wang, X. (2023). Clustering of longitudinal curves via a penalized method and EM algorithm. Computational Statistics, pages 1–28.
  34. Clustered coefficient regression models for Poisson process with an application to seasonal warranty claim data. Technometrics, pages 1–10.
  35. Spatial heterogeneity automatic detection and estimation. Computational Statistics & Data Analysis, 180.
  36. Simultaneous confidence corridors for mean functions in functional data analysis of imaging data. Biometrics, 76(2):427–437.
  37. High-dimensional integrative analysis with homogeneity and sparsity recovery. Journal of Multivariate Analysis, 174:104529.
  38. Fusion learning of functional linear regression with application to genotype-by-environment interaction studies. Journal of Agricultural, Biological and Environmental Statistics, pages 1–22.
  39. Estimation and inference for generalized geoadditive models. Journal of the American Statistical Association, pages 1–27.
  40. Spatiotemporal autoregressive partially linear varying coefficient models. Statistica Sinica, 32(4).
  41. Zhang, C.-H. et al. (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of statistics, 38(2):894–942.
  42. A spatio-temporal nonparametric bayesian variable selection model of fmri data for clustering correlated time courses. NeuroImage, 95:162–175.
  43. Learning coefficient heterogeneity over networks: A distributed spanning-tree-based fused-lasso regression. Journal of the American Statistical Association, pages 1–13.
  44. Satellite-based monitoring urban environmental change and its implications in the coupled human-nature system. In EGU General Assembly Conference Abstracts, page 10556.
  45. Spatially varying coefficient model for neuroimaging data with jump discontinuities. Journal of the American Statistical Association, 109(507):1084–1098.
  46. Cluster analysis of longitudinal profiles with subgroups. Electronic Journal of Statistics, 12(1):171–193.
  47. Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American statistical association, 101(476):1418–1429.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com