Learned spatial data partitioning (2306.04846v2)
Abstract: Due to the significant increase in the size of spatial data, it is essential to use distributed parallel processing systems to efficiently analyze spatial data. In this paper, we first study learned spatial data partitioning, which effectively assigns groups of big spatial data to computers based on locations of data by using machine learning techniques. We formalize spatial data partitioning in the context of reinforcement learning and develop a novel deep reinforcement learning algorithm. Our learning algorithm leverages features of spatial data partitioning and prunes ineffective learning processes to find optimal partitions efficiently. Our experimental study, which uses Apache Sedona and real-world spatial data, demonstrates that our method efficiently finds partitions for accelerating distance join queries and reduces the workload run time by up to 59.4%.
- Hadoop-GIS: a high performance spatial data warehousing system over mapreduce. PVLDB 6, 11 (2013), 1009–1020.
- Aqwa: adaptive query workload aware partitioning of big spatial data. PVLDB 8, 13 (2015), 2062–2073.
- Apache Sedona. 2021. https://sedona.apache.org/.
- Multi-step Reinforcement Learning: A Unifying Algorithm. arXiv (2018).
- Tsunami: A learned multi-dimensional index for correlated data and skewed workloads. PVLDB (2020), 74–86.
- Spatial partitioning techniques in SpatialHadoop. PVLDB 8, 12 (2015), 1602–1605.
- Ahmed Eldawy and Mohamed F. Mokbel. 2015. SpatialHadoop: A MapReduce framework for spatial data. In ICDE. 1352–1363.
- R. A. Finkel and J. L. Bentley. 1974. Quad Trees a Data Structure for Retrieval on Composite Keys. The Acta Informatica 4, 1 (1974), 1–9.
- A Reinforcement Learning Based R-Tree for Spatial Data Indexing in Dynamic Environments.
- Deep Q-learning From Demonstrations. In AAAI. 3223–3230.
- Learning a Partitioning Advisor for Cloud Databases. In SIGMOD. 143–157.
- Kyle D Julian and Mykel J Kochenderfer. 2019. Distributed wildfire surveillance with autonomous aircraft using deep reinforcement learning. Journal of Guidance, Control, and Dynamics 42, 8 (2019), 1768–1778.
- Reinforcement learning: A survey. JAIR 4 (1996), 237–285.
- An Index Advisor Using Deep Reinforcement Learning. In CIKM. 2105–2108.
- LISA: A learned index structure for spatial data. In SIGMOD. 2119–2133.
- Yuxi Li. 2017. Deep reinforcement learning: An overview. arXiv (2017).
- Geo-ALM: POI Recommendation by Fusing Geographical Information and Adversarial Learning Mechanism.. In IJCAI, Vol. 7. 1807–1813.
- Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529–533.
- Massively Parallel Methods for Deep Reinforcement Learning. arXiv (2015).
- Learning multi-dimensional indexes. In SIGMOD. 985–1000.
- A deep reinforcement learning framework for rebalancing dockless bike sharing systems. In AAAI, Vol. 33. 1393–1400.
- Intercity Simulation of Human Mobility at Rare Events via Reinforcement Learning. In SIGSPATIAL. 293–302.
- Effectively learning spatial indices. PVLDB (2020), 2341–2354.
- John T. Robinson. 1981. The K-D-B-Tree: A Search Structure for Large Multidimensional Dynamic Indexes. In SIGMOD. 10–18.
- Yuya Sasaki. 2021. A Survey on IoT Big Data Analytic Systems: Current and Future. IEEE Internet of Things Journal (2021).
- Sequenced route query with semantic hierarchy. In EDBT. 37–48.
- Prioritized Experience Replay. arXiv (2016). arXiv:1511.05952
- A brief introduction to geospatial big data analytics with apache AsterixDB. In International Workshop on APIs and Libraries for Geospatial Data Science. 1–2.
- Locationspark: A distributed in-memory data management system for big spatial data. PVLDB 9, 13 (2016), 1565–1568.
- Sebastian Thrun and Michael L Littman. 2000. Reinforcement learning: an introduction. AI Magazine 21, 1 (2000), 103–103.
- A Learned Query Optimizer for Spatial Join. In SIGSPATIAL. 458–467.
- Using Deep Learning for Big Spatial Data Partitioning. TSAS 7 (08 2020), 1–37.
- Incremental partitioning for efficient spatial data analytics. PVLDB 15, 3 (2021), 713–726.
- Christopher JCH Watkins and Peter Dayan. 2004. Technical Note: Q-Learning. The Machine Learning 8 (2004), 279–292.
- Intellilight: A reinforcement learning approach for intelligent traffic light control. In SIGKDD. 2496–2505.
- Simba: Efficient in-memory spatial analytics. In Proceedings of the SIGMOD. 1071–1085.
- Qd-tree: Learning data layouts for big data analytics. In SIGMOD. 193–208.