Accelerating Spatio-Textual Queries with Learned Indices (2312.09864v1)
Abstract: Efficiently computing spatio-textual queries has become increasingly important in various applications that need to quickly retrieve geolocated entities associated with textual information, such as in location-based services and social networks. To accelerate such queries, several works have proposed combining spatial and textual indices into hybrid index structures. Recently, the novel idea of replacing traditional indices with ML models has attracted a lot of attention. This includes works on learned spatial indices, where the main challenge is to address the lack of a total ordering among objects in a multidimensional space. In this work, we investigate how to extend this novel type of index design to the case of spatio-textual data. We study different design choices, based on either loose or tight coupling between the spatial and textual part, as well as a hybrid index that combines a traditional and a learned component. We also perform an experimental evaluation using several real-world datasets to assess the potential benefits of using a learned index for evaluating spatio-textual queries.
- A tutorial on learned multi-dimensional indexes. In SIGSPATIAL, pages 1–4, 2020.
- The R*-tree: An efficient and robust access method for points and rectangles. In SIGMOD, pages 322–331, 1990.
- Efficient and scalable method for processing top-k spatial boolean queries. In SSDBM, pages 87–95, 2010.
- Spatial keyword query processing: An experimental evaluation. PVLDB, 6(3):217–228, 2013.
- Efficient query processing in geographic web search engines. In SIGMOD, pages 277–288, 2006.
- Location- and keyword-based querying of geo-textual data: a survey. VLDB J., 30(4):603–640, 2021.
- Text vs. space: efficient geo-search query processing. In CIKM, pages 423–432, 2011.
- The ML-index: A multidimensional, learned index for point, range, and nearest-neighbor queries. In EDBT, pages 407–410, 2020.
- Tsunami: A learned multi-dimensional index for correlated data and skewed workloads. PVLDB, 14(2):74–86, 2020.
- C. Faloutsos and S. Christodoulakis. Signature files: An access method for documents and its analytical performance evaluation. ACM Transactions on Information Systems (TOIS), 2(4):267–288, 1984.
- Keyword search on spatial databases. In ICDE, pages 656–665, 2008.
- Processing spatial-keyword (SK) queries in geographic information retrieval (GIR) systems. In SSDBM, page 16, 2007.
- Hybrid indexing and seamless ranking of spatial and textual features of web documents. In DEXA, volume 6261, pages 450–466, 2010.
- The case for learned index structures. In SIGMOD, pages 489–504, 2018.
- LISA: A learned index structure for spatial data. In SIGMOD, pages 2119–2133, 2020.
- Benchmarking learned indexes. PVLDB, 14(1):1–13, 2020.
- Learning multi-dimensional indexes. In SIGMOD, pages 985–1000, 2020.
- Effectively learning spatial indices. PVLDB, 13(12):2341–2354, 2020.
- Theoretically optimal and empirically efficient r-trees with strong parallelizability. PVLDB, 11(5):621–634, 2018.
- Efficient processing of top-k spatial keyword queries. In SSTD, pages 205–222, 2011.
- Spatio-textual indexing for geographical search on the web. In SSTD, volume 3633, pages 218–235, 2005.
- Learned index for spatial queries. In MDM, pages 569–574, 2019.
- A framework for efficient spatial web object retrieval. VLDB J., 21(6):797–822, 2012.
- Joint top-k spatial keyword query processing. IEEE Trans. Knowl. Data Eng., 24(10):1889–1903, 2012.
- Inverted linear quadtree: Efficient top k spatial keyword search. IEEE Trans. Knowl. Data Eng., 28(7):1706–1721, 2016.
- Scalable top-k spatial keyword search. In EDBT, pages 359–370, 2013.
- Hybrid index structures for location-based web search. In CIKM, pages 155–162, 2005.