A Tidy Framework and Infrastructure to Systematically Assemble Spatio-temporal Indexes from Multivariate Data (2401.05812v2)
Abstract: Indexes are useful for summarizing multivariate information into single metrics for monitoring, communicating, and decision-making. While most work has focused on defining new indexes for specific purposes, more attention needs to be directed towards making it possible to understand index behavior in different data conditions, and to determine how their structure affects their values and variation in values. Here we discuss a modular data pipeline recommendation to assemble indexes. It is universally applicable to index computation and allows investigation of index behavior as part of the development procedure. One can compute indexes with different parameter choices, adjust steps in the index definition by adding, removing, and swapping them to experiment with various index designs, calculate uncertainty measures, and assess indexes robustness. The paper presents three examples to illustrate the pipeline framework usage: comparison of two different indexes designed to monitor the spatio-temporal distribution of drought in Queensland, Australia; the effect of dimension reduction choices on the Global Gender Gap Index (GGGI) on countries ranking; and how to calculate bootstrap confidence intervals for the Standardized Precipitation Index (SPI). The methods are supported by a new R package, called tidyindex.
- Alahacoon, Niranga, and Mahesh Edirisinghe. 2022. “A Comprehensive Assessment of Remote Sensing and Traditional Based Drought Monitoring Indices at Global and Regional Scale.” Geomatics, Natural Hazards and Risk 13 (December): 762–99. https://doi.org/10.1080/19475705.2022.2044394.
- Becker, William, Giulio Caperna, Maria Del Sorbo, Hedvig Norlen, Eleni Papadimitriou, and Michaela Saisana. 2022. “COINr: An R Package for Developing Composite Indicators.” Journal of Open Source Software 7 (78): 4567. https://doi.org/10.21105/joss.04567.
- Buja, A, D Asimov, C Hurley, and JA McDonald. 1988. “Elements of a Viewing Pipeline for Data Analysis.” In Dynamic Graphics for Statistics, 277–308. Wadsworth, Belmont.
- Buja, Andreas, Dianne Cook, Daniel Asimov, and Catherine Hurley. 2005. “Computational Methods for High-Dimensional Rotations in Data Visualization.” Handbook of Statistics 24: 391–413. https://doi.org/10.1016/S0169-7161(04)24014-7.
- Donoho, David. 2017. “50 Years of Data Science.” Journal of Computational and Graphical Statistics 26 (4): 745–66. https://doi.org/10.1080/10618600.2017.1384734.
- Efron, B. 1979. “Bootstrap Methods: Another Look at the Jackknife.” The Annals of Statistics 7 (1): 1–26. https://doi.org/10.1214/aos/1176344552.
- Fisher, Ronald A. 1936. “The Use of Multiple Measurements in Taxonomic Problems.” Annals of Eugenics 7 (2): 179–88.
- Fisher, Ronald Aylmer. 1970. “Statistical Methods for Research Workers.” In Breakthroughs in Statistics: Methodology and Distribution, 66–70. Springer. https://doi.org/10.1007/978-1-4612-4380-9_6.
- Hao, Zengchao, and Vijay P. Singh. 2015. “Drought Characterization from a Multivariate Perspective: A Review.” Journal of Hydrology 527 (August): 668–78. https://doi.org/10.1016/j.jhydrol.2015.05.031.
- Hotelling, Harold. 1933. “Analysis of a Complex of Statistical Variables into Principal Components.” Journal of Educational Psychology 24 (6): 417.
- Jones, Brenda, and Jean Andrey. 2007. “Vulnerability Index Construction: Methodological Choices and Their Influence on Identifying Vulnerable Neighbourhoods.” International Journal of Emergency Management 4 (2): 269–95. https://doi.org/10.1504/IJEM.2007.013994.
- Laimighofer, Johannes, and Gregor Laaha. 2022. “How Standard Are Standardized Drought Indices? Uncertainty Components for the SPI & SPEI Case.” Journal of Hydrology 613 (October): 128385. https://doi.org/10.1016/j.jhydrol.2022.128385.
- Martin, Steve. 2023. Gpindex: Generalized Price and Quantity Indexes. https://CRAN.R-project.org/package=gpindex.
- R Core Team. 2021. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
- Saisana, M., A. Saltelli, and S. Tarantola. 2005. “Uncertainty and Sensitivity Analysis Techniques as Tools for the Quality Assessment of Composite Indicators.” Journal of the Royal Statistical Society Series A: Statistics in Society 168 (2): 307–23. https://doi.org/10.1111/j.1467-985X.2005.00350.x.
- Spyrison, Nicholas, and Dianne Cook. 2020. “Spinifex: An R Package for Creating a Manual Tour of Low-Dimensional Projections of Multivariate Data.” The R Journal 12: 243–57. https://doi.org/10.32614/RJ-2020-027.
- Sutherland, Peter, Anthony Rossini, Thomas Lumley, Nicholas Lewin-Koh, Julie Dickerson, Zach Cox, and Dianne Cook. 2000. “Orca: A Visualization Toolkit for High-Dimensional Data.” Journal of Computational and Graphical Statistics 9 (3): 509–29. https://www.jstor.org/stable/1390943.
- Tate, Eric. 2012. “Social Vulnerability Indices: A Comparative Assessment Using Uncertainty and Sensitivity Analysis.” Natural Hazards 63 (2): 325–47. https://doi.org/10.1007/s11069-012-0152-2.
- ———. 2013. “Uncertainty Analysis for a Social Vulnerability Index.” Annals of the Association of American Geographers 103 (3): 526–43. https://doi.org/10.1080/00045608.2012.700616.
- Vicente-Serrano, Sergio M., Santiago Beguería, and Juan I. López-Moreno. 2010. “A Multiscalar Drought Index Sensitive to Global Warming: The Standardized Precipitation Evapotranspiration Index.” Journal of Climate 23 (7): 1696–1718. https://journals.ametsoc.org/view/journals/clim/23/7/2009jcli2909.1.xml.
- Wang, Earo, Dianne Cook, and Rob J Hyndman. 2020. “A New Tidy Data Structure to Support Exploration and Modeling of Temporal Data.” Journal of Computational and Graphical Statistics 29 (3): 466–78. https://doi.org/10.1080/10618600.2019.1695624.
- Wickham, Hadley. 2011. “The Split-Apply-Combine Strategy for Data Analysis.” Journal of Statistical Software 40 (April): 1–29. https://doi.org/10.18637/jss.v040.i01.
- ———. 2014. “Tidy Data.” Journal of Statistical Software 59 (September): 1–23. https://doi.org/10.18637/jss.v059.i10.
- Wickham, Hadley, Dianne Cook, Heike Hofmann, and Andreas Buja. 2011. “Tourr: An R Package for Exploring Multivariate Data with Projections.” Journal of Statistical Software 40 (2). https://doi.org/10.18637/jss.v040.i02.
- Wickham, Hadley, Michael Lawrence, Dianne Cook, Andreas Buja, Heike Hofmann, and Deborah F. Swayne. 2009. “The Plumbing of Interactive Graphics.” Computational Statistics 24 (2): 207–15. https://doi.org/10.1007/s00180-008-0116-x.
- World Economic Forum. 2023. “The Global Gender Gap Report 2023.” https://www3.weforum.org/docs/WEF_GGGR_2023.pdf.
- Xie, Yihui, Heike Hofmann, and Xiaoyue Cheng. 2014. “Reactive Programming for Interactive Graphics.” Statistical Science 29 (2): 201–13. https://www.jstor.org/stable/43288470?seq=1.
- Zargar, Amin, Rehan Sadiq, Bahman Naser, and Faisal I Khan. 2011. “A Review of Drought Indices.” Environmental Reviews 19 (NA): 333–49. https://www.jstor.org/stable/envirevi.19.333.
- Zhang, H. Sherry, Dianne Cook, Ursula Laa, Nicolas Langrené, and Patricia Menéndez. to appear. “Cubble: An R Package for Organizing and Wrangling Multivariate Spatio-Temporal Data.” Journal of Statistical Software, to appear.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.