Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 26 tok/s Pro
GPT-5 High 35 tok/s Pro
GPT-4o 99 tok/s Pro
Kimi K2 192 tok/s Pro
GPT OSS 120B 440 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

SpaCE: The Spatial Confounding Environment (2312.00710v3)

Published 1 Dec 2023 in cs.LG, stat.ME, and stat.ML

Abstract: Spatial confounding poses a significant challenge in scientific studies involving spatial data, where unobserved spatial variables can influence both treatment and outcome, possibly leading to spurious associations. To address this problem, we introduce SpaCE: The Spatial Confounding Environment, the first toolkit to provide realistic benchmark datasets and tools for systematically evaluating causal inference methods designed to alleviate spatial confounding. Each dataset includes training data, true counterfactuals, a spatial graph with coordinates, and smoothness and confounding scores characterizing the effect of a missing spatial confounder. It also includes realistic semi-synthetic outcomes and counterfactuals, generated using state-of-the-art machine learning ensembles, following best practices for causal inference benchmarks. The datasets cover real treatment and covariates from diverse domains, including climate, health and social sciences. SpaCE facilitates an automated end-to-end pipeline, simplifying data loading, experimental setup, and evaluating machine learning and causal inference models. The SpaCE project provides several dozens of datasets of diverse sizes and spatial complexity. It is publicly available as a Python package, encouraging community feedback and contributions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (65)
  1. Spatial causality: A systematic review on spatial causal inference. Geographical Analysis, 55(1):56–89, 2023.
  2. Hyperlocal super-learned pm2. 5 components across the contiguous us. 2022.
  3. Luc Anselin. Spatial econometrics: methods and models, volume 4. Springer Science & Business Media, 1988.
  4. Causal inference under interference and network uncertainty. In Uncertainty in Artificial Intelligence, pp.  1028–1038, 2020.
  5. Leo Breiman. Statistical modeling: The two cultures (with comments and a rejoinder by the author). Statistical science, 16(3):199–231, 2001.
  6. Bureau of Labor Statistics. Local area unemployment statistics. https://www.bls.gov/lau/#cntyaa, 2019.
  7. CDC. Behavioral risk factor surveillance system (brfss). https://www.cdc.gov/brfss/index.html, 2010.
  8. Centers for Disease Control and Prevention. 2010 census. https://www.atsdr.cdc.gov/placeandhealth/svi/index.html, 2020.
  9. Algorithm 887: Cholmod, supernodal sparse cholesky factorization and update/downdate. ACM Transactions on Mathematical Software (TOMS), 35(3), 2008.
  10. Evaluation methods and measures for causal learning algorithms. IEEE Transactions on Artificial Intelligence, 3(6):924–943, 2022.
  11. Daily local-level estimates of ambient wildfire smoke pm2. 5 for the contiguous us. Environmental Science & Technology, 56(19):13607–13621, 2022.
  12. Really doing great at estimating cate? a critical look at ml benchmarking practices in treatment effect estimation. In Thirty-fifth conference on neural information processing systems datasets and benchmarks track (round 2), 2021.
  13. An ensemble-based model of pm2. 5 concentration across the contiguous united states with high spatiotemporal resolution. Environment international, 130:104909, 2019.
  14. Automated versus do-it-yourself methods for causal inference. Statistical Science, 34(1):43–68, 2019.
  15. Spatial+: a novel approach to spatial confounding. Biometrics, 78(4):1279–1290, 2022.
  16. Autogluon-tabular: Robust and accurate automl for structured data. arXiv preprint arXiv:2003.06505, 2020.
  17. Identification and estimation of treatment and interference effects in observational studies on networks. Journal of the American Statistical Association, 116(534):901–918, 2021.
  18. Linear mixed-effects model. Springer, 2013.
  19. How and why to use experimental data to evaluate methods for observational causal inference. In International Conference on Machine Learning, pp. 3660–3671, 2021.
  20. A causal inference framework for spatial confounding. arXiv preprint arXiv:2112.14946, 2021a.
  21. Approaches to spatial confounding in geostatistics. arXiv preprint arXiv:2112.14946, 2021b.
  22. Consistency of common spatial estimators under spatial confounding. arXiv preprint arXiv:2308.12181, 2023.
  23. Glasmeier, Amy K. Living Wage Calculator. County health rankings & roadmaps. livingwage.mit.edu, 2020.
  24. Jennifer L Hill. Bayesian nonparametric modeling for causal inference. Journal of Computational and Graphical Statistics, 20(1):217–240, 2011.
  25. Paul W Holland. Statistics and causal inference. Journal of the American statistical Association, 81(396):945–960, 1986.
  26. Toward causal inference with interference. Journal of the American Statistical Association, 103(482):832–842, 2008.
  27. Matching methods for causal inference with time-series cross-sectional data. American Journal of Political Science, 2021.
  28. Institute for Health Metrics and Evaluation. United states life expectancy and age-specific mortality risk by county. http://ghdx.healthdata.org/us-data, 2014.
  29. A generalized spatial two-stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbances. The Journal of Real Estate Finance and Economics, 17:99–121, 1998.
  30. A generalized moments estimator for the autoregressive parameter in a spatial model. International economic review, 40(2):509–533, 1999.
  31. Re-thinking spatial confounding in spatial linear mixed models. arXiv preprint arXiv:2301.05743, 2023.
  32. Restricted spatial regression methods: Implications for inference. Journal of the American Statistical Association, 117(537):482–494, 2022.
  33. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
  34. Tune: A research platform for distributed model selection and training. arXiv preprint arXiv:1807.05118, 2018.
  35. Causal inference under networked interference and intervention policy enhancement. In International Conference on Artificial Intelligence and Statistics, pp.  3700–3708, 2021.
  36. Us county level presidential results. https://github.com/tonmcg/US_County_Level_Election_Results_08-16, 2016.
  37. Patrick A. P. Moran. Notes on continuous stochastic phenomena. Biometrika, 37(1/2):17–23, 1950.
  38. Using simulation studies to evaluate statistical methods. Statistics in medicine, 38(11):2074–2102, 2019.
  39. Realcause: Realistic causal inference benchmarking. arXiv preprint arXiv:2011.15007, 2020.
  40. Vcnet and functional targeted regularization for learning causal effects of continuous treatments. In International Conference on Learning Representations, 2020.
  41. Causal diagrams for interference. Statistical Science, pp.  559–578, 2014.
  42. Spatial causal inference in the presence of unmeasured confounding and interference. arXiv preprint arXiv:2303.08218, 2023.
  43. Adjusting for unmeasured spatial confounding with distance adjusted propensity score matching. Biostatistics, 20(2):256–272, 2019.
  44. Causal inference with spatio-temporal data: estimating the effects of airstrikes on insurgent violence in iraq. Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(5), 2022.
  45. Judea Pearl. Causality. Cambridge university press, 2009.
  46. U.S. broadband coverage data set: A differentially private data release. CoRR, abs/2103.14035, 2021.
  47. Oregon State University PRISM Climate Group. URL https://prism.oregonstate.edu. Created: 2014-02-04; accessed Jan 2022.
  48. Effects of residual smoothing on the posterior of the fixed effects in disease-mapping models. Biometrics, 62(4):1197–1206, 2006.
  49. PySAL: A Python Library of Spatial Analytical Methods. The Review of Regional Studies, 37(1):5–27, 2007.
  50. Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography, 40(8):913–929, 2017.
  51. Donald B Rubin. Matching to remove bias in observational studies. Biometrics, pp.  159–183, 1973.
  52. Donald B Rubin. Causal inference using potential outcomes: Design, modeling, decisions. Journal of the American Statistical Association, 100(469):322–331, 2005.
  53. Gaussian Markov random fields: theory and applications. CRC press, 2005.
  54. Learning counterfactual representations for estimating individual dose-response curves. In Proceedings of the AAAI Conference on Artificial Intelligence, 2020.
  55. Adapting neural networks for the estimation of treatment effects. Advances in neural information processing systems, 32, 2019.
  56. Nutritional labels for data and models. A Quarterly bulletin of the Computer Society of the IEEE Technical Committee on Data Engineering, 42(3), 2019.
  57. How likely are ride-share drivers to earn a living wage? large-scale spatio-temporal density smoothing with the graph-fused elastic net. arXiv preprint arXiv:1911.08106, 2019.
  58. Weather2vec: Representation learning for causal inference with non-local confounding in air pollution and climate studies. Proceedings of the 37th AAAI Conference in Artificial Intelligence, 2023.
  59. University of Wisconsin Population Health Institute. County health rankings & roadmaps. https://www.countyhealthrankings.org/about-us, 2021.
  60. Evaluating recent methods to overcome spatial confounding. Revista Matemática Complutense, 36(2):333–360, 2023.
  61. US Census Bureau. 2010 census. https://www.census.gov/programs-surveys/decennial-census/decade.2010.html, 2010.
  62. Using embeddings to correct for unobserved confounding in networks. Advances in Neural Information Processing Systems, 32, 2019.
  63. Washington Post. Police shooting database. https://www.washingtonpost.com/graphics/investigations/police-shootings-database/, 2020.
  64. Evaluating the impact of long-term exposure to fine particulate matter on mortality among the elderly. Science advances, 6(29):eaba5692, 2020.
  65. Omry Yadan. Hydra - a framework for elegantly configuring complex applications. Github, 2019. URL https://github.com/facebookresearch/hydra.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.