Measuring Geographic Diversity of Foundation Models with a Natural Language--based Geo-guessing Experiment on GPT-4 (2404.07612v1)
Abstract: Generative AI based on foundation models provides a first glimpse into the world represented by machines trained on vast amounts of multimodal data ingested by these models during training. If we consider the resulting models as knowledge bases in their own right, this may open up new avenues for understanding places through the lens of machines. In this work, we adopt this thinking and select GPT-4, a state-of-the-art representative in the family of multimodal LLMs, to study its geographic diversity regarding how well geographic features are represented. Using DBpedia abstracts as a ground-truth corpus for probing, our natural language--based geo-guessing experiment shows that GPT-4 may currently encode insufficient knowledge about several geographic feature types on a global level. On a local level, we observe not only this insufficiency but also inter-regional disparities in GPT-4's geo-guessing performance on UNESCO World Heritage Sites that carry significance to both local and global populations, and the inter-regional disparities may become smaller as the geographic scale increases. Morever, whether assessing the geo-guessing performance on a global or local level, we find inter-model disparities in GPT-4's geo-guessing performance when comparing its unimodal and multimodal variants. We hope this work can initiate a discussion on geographic diversity as an ethical principle within the GIScience community in the face of global socio-technical challenges.
- Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
- Are large language models geospatially knowledgeable? In Proceedings of the 31st ACM International Conference on Advances in Geographic Information Systems, pages 1–4, 2023.
- On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
- P. Jackson. Thinking geographically. Geography, 91(3):199–204, 2006.
- Understanding place identity with generative ai (short paper). In 12th International Conference on Geographic Information Science (GIScience 2023). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2023.
- K. Janowicz. Philosophical foundations of geoai: Exploring sustainability, diversity, and bias in geoai and spatial data science. In Handbook of Geospatial Artificial Intelligence, pages 26–42. CRC Press, 2023.
- Dbpedia–a large-scale, multilingual knowledge base extracted from wikipedia. Semantic web, 6(2):167–195, 2015.
- Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474, 2020.
- Do language models know the way to rome? In Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, pages 510–517, 2021.
- Geoparsing: Solved or biased? an evaluation of geographic biases in geoparsing. AGILE: GIScience Series, 3:9, 2022.
- Towards a foundation model for geospatial artificial intelligence (vision paper). In Proceedings of the 30th International Conference on Advances in Geographic Information Systems, pages 1–4, 2022.
- S. Openshaw. The modifiable areal unit problem. Concepts and techniques in modern geography, 1984.
- Language models as knowledge bases? arXiv preprint arXiv:1909.01066, 2019.
- No classification without representation: Assessing geodiversity issues in open data sets for the developing world. In NIPS 2017 workshop: Machine Learning for the Developing World, 2017.
- Deep fake geography? when geospatial data encounter artificial intelligence. Cartography and Geographic Information Science, 48(4):338–352, 2021.