Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

An IR-based Approach Towards Automated Integration of Geo-spatial Datasets in Map-based Software Systems (1906.06331v2)

Published 13 Jun 2019 in cs.DB

Abstract: Data is arguably the most valuable asset of the modern world. In this era, the success of any data-intensive solution relies on the quality of data that drives it. Among vast amount of data that are captured, managed, and analyzed everyday, geospatial data are one of the most interesting class of data that hold geographical information of real-world phenomena and can be visualized as digital maps. Geo-spatial data is the source of many enterprise solutions that provide local information and insights. In order to increase the quality of such solutions, companies continuously aggregate geospatial datasets from various sources. However, lack of a global standard model for geospatial datasets makes the task of merging and integrating datasets difficult and error-prone. Traditionally, domain experts manually validate the data integration process by merging new data sources and/or new versions of previous data against conflicts and other requirement violations. However, this approach is not scalable and is hinder toward rapid release, when dealing with frequently changing big datasets. Thus more automated approaches with limited interaction with domain experts is required. As a first step to tackle this problem, in this paper, we leverage Information Retrieval (IR) and geospatial search techniques to propose a systematic and automated conflict identification approach. To evaluate our approach, we conduct a case study in which we measure the accuracy of our approach in several real-world scenarios and we interview with software developers at Localintel Inc. (our industry partner) to get their feedbacks.

Summary

We haven't generated a summary for this paper yet.