Uncovering large inconsistencies between machine learning derived gridded settlement datasets (2404.13127v1)
Abstract: High-resolution human settlement maps provide detailed delineations of where people live and are vital for scientific and practical purposes, such as rapid disaster response, allocation of humanitarian resources, and international development. The increased availability of high-resolution satellite imagery, combined with powerful techniques from machine learning and artificial intelligence, has spurred the creation of a wealth of settlement datasets. However, the precise agreement and alignment between these datasets is not known. Here we quantify the overlap of high-resolution settlement map for 42 African countries developed by Google (Open Buildings), Meta (High Resolution Population Maps) and GRID3 (Geo-Referenced Infrastructure and Demographic Data for Development). Across all studied countries we find large disagreement between datasets on how much area is considered settled. We demonstrate that there are considerable geographic and socio-economic factors at play and build a machine learning model to predict for which areas datasets disagree. It it vital to understand the shortcomings of AI derived high-resolution settlement layers as international organizations, governments, and NGOs are already experimenting with incorporating these into programmatic work. As such, we anticipate our work to be a starting point for more critical and detailed analyses of AI derived datasets for humanitarian, planning, policy, and scientific purposes.
- Predictability of population displacement after the 2010 haiti earthquake. \JournalTitleProceedings of the National Academy of Sciences 109, 11576–11581 (2012).
- Quantifying the dynamics of migration after hurricane maria in puerto rico. \JournalTitleProceedings of the National Academy of Sciences 117, 32772–32778 (2020).
- Mobile landscapes: using location data from cell phones for urban analysis. \JournalTitleEnvironment and planning B: Planning and design 33, 727–748 (2006).
- Large-scale spatial population databases in infectious disease research. \JournalTitleInternational journal of health geographics 11, 1–13 (2012).
- O’neill, B. C. et al. Global demographic trends and future carbon emissions. \JournalTitleProceedings of the National Academy of Sciences 107, 17521–17526 (2010).
- Deville, P. et al. Dynamic population mapping using mobile phone data. \JournalTitleProceedings of the National Academy of Sciences 111, 15888–15893 (2014).
- U.S. Government Accountability Office. 2020 Census: Innovations Helped with Implementation, but Bureau Can Do More to Realize Future Benefits. https://www.gao.gov/products/gao-21-478. Accessed: 2021-08-30.
- Dabalen, A. et al. Mobile phone panel surveys in developing countries: a practical guide for microdata collection (The World Bank, Directions in Development, Washington, DC, 2016).
- United Nations Children’s Fund. U-Report. https://www.unicef.org/innovation/U-Report.
- High resolution global gridded data for use in population studies. \JournalTitleScientific data 4, 1–17 (2017).
- Wardrop, N. et al. Spatially disaggregated population estimates in the absence of national population and housing census data. \JournalTitleProceedings of the National Academy of Sciences 115, 3529–3537 (2018).
- Tiecke, T. G. et al. Mapping the world population one building at a time. \JournalTitlearXiv preprint arXiv:1712.05839 (2017).
- Sirko, W. et al. Continental-scale building detection from high resolution satellite imagery. \JournalTitlearXiv preprint arXiv:2107.12283 (2021).
- Landscan: a global population database for estimating populations at risk. \JournalTitlePhotogrammetric engineering and remote sensing 66, 849–857 (2000).
- Doxsey-Whitfield, E. et al. Taking advantage of the improved availability of census data: a first look at the gridded population of the world, version 4. \JournalTitlePapers in Applied Geography 1, 226–234 (2015).
- Disaggregating census data for population mapping using random forests with remotely-sensed and ancillary data. \JournalTitlePloS one 10, e0107042 (2015).
- Development of new open and free multi-temporal global population grids at 250 m resolution (Association of Geographic Information Laboratories in Europe (AGILE), 2016).
- Fries, B. et al. Measuring the accuracy of gridded human population density surfaces: A case study in bioko island, equatorial guinea. \JournalTitlePloS one 16, e0248646 (2021).
- National population mapping from sparse survey data: A hierarchical bayesian modeling framework to account for uncertainty. \JournalTitleProceedings of the National Academy of Sciences 117, 24173–24179 (2020).
- Tatem, A. J. et al. Millennium development health metrics: where do africa’s children and women of childbearing age live? \JournalTitlePopulation health metrics 11, 1–11 (2013).
- Alegana, V. A. et al. Fine resolution mapping of population age-structures for health and development applications. \JournalTitleJournal of The Royal Society Interface 12, 20150073 (2015).
- Pezzulo, C. et al. Sub-national mapping of population pyramids and dependency ratios in africa and asia. \JournalTitleScientific data 4, 1–15 (2017).
- Population policy in transition in the developing world. \JournalTitleScience 333, 574–576 (2011).
- Tuholske, C. et al. Implications for tracking sdg indicator metrics with gridded population data. \JournalTitleSustainability 13, 7329 (2021).
- Leyk, S. et al. The spatial allocation of population: a review of large-scale gridded population data products and their fitness for use. \JournalTitleEarth System Science Data 11, 1385–1409 (2019).
- Facebook Data for Good. Population density maps. https://dataforgood.fb.com/tools/population-density-maps/. Acessed: 2021-09-07.
- Bayesian gridded population estimates for Guinea 2019 (GIN v1.0) using census microdata, population projections, and building footprints (2021).
- A pixel level evaluation of five multitemporal global gridded population datasets: A case study in sweden, 1990–2015. \JournalTitlePopulation and environment 42, 255–277 (2020).
- Geo-Referenced Infrastructure and Demographic Data for Development. Data Release Statement, GRID3 Settlement Extents, Version 01 Alpha. https://data.grid3.org/search?source=grid3&tags=gridded%20population&type=web%20map (2021). Accessed: 2021-09-07.
- Chamberlain, H. R. et al. Building footprint data for countries in africa: to what extent are existing data products comparable? \JournalTitleComputers, Environment and Urban Systems 110, 102104 (2024).
- Radboud University, Institute for Managament Research. Global Data Lab. https://globaldatalab.org/shdi/shapefiles/.
- United Nations Development Programme. Human Development Index. http://hdr.undp.org/en/content/human-development-index-hdi (2020).
- Microestimates of wealth for all low-and middle-income countries. \JournalTitleProceedings of the National Academy of Sciences 119, e2113658119 (2022).
- A harmonized global nighttime light dataset 1992–2018. \JournalTitleScientific data 7, 168 (2020).
- Florczyk, A. J. et al. Ghsl data package 2019. \JournalTitleLuxembourg, EUR 29788, 290498 (2019).
- On over-fitting in model selection and subsequent selection bias in performance evaluation. \JournalTitleThe Journal of Machine Learning Research 11, 2079–2107 (2010).
- United Nations General Assembly. Transforming our world: the 2030 Agenda for Sustainable Development. https://sustainabledevelopment.un.org/post2015/transformingourworld (2015).
- Blumenstock, J. Machine learning can help get covid-19 aid to those who need it most. \JournalTitleNature (2020).
- Machine learning and phone data can improve targeting of humanitarian aid. \JournalTitleNature 603, 864–870 (2022).
- Rees, N. The climate crisis is a child rights crisis: Introducing the children’s climate risk index. \JournalTitleUNICEF (2021).
- Kerner, H. et al. How accurate are existing land cover maps for agriculture in sub-saharan africa? \JournalTitlearXiv preprint arXiv:2307.02575 (2023).
- Sekara, V. et al. Are machine learning technologies ready to be used for humanitarian work and development? \JournalTitlearXiv preprint arXiv:2307.01891 (2023).
- The Humanitarian Data Exchange. Facebook high resolution population density maps. https://data.humdata.org/organization/meta. Acessed: 2023-09-07.
- The subnational human development database. \JournalTitleScientific data 6, 1–15 (2019).