A Survey of Learned Indexes for the Multi-dimensional Space (2403.06456v1)
Abstract: A recent research trend involves treating database index structures as Machine Learning (ML) models. In this domain, single or multiple ML models are trained to learn the mapping from keys to positions inside a data set. This class of indexes is known as "Learned Indexes." Learned indexes have demonstrated improved search performance and reduced space requirements for one-dimensional data. The concept of one-dimensional learned indexes has naturally been extended to multi-dimensional (e.g., spatial) data, leading to the development of "Learned Multi-dimensional Indexes". This survey focuses on learned multi-dimensional index structures. Specifically, it reviews the current state of this research area, explains the core concepts behind each proposed method, and classifies these methods based on several well-defined criteria. We present a taxonomy that classifies and categorizes each learned multi-dimensional index, and survey the existing literature on learned multi-dimensional indexes according to this taxonomy. Additionally, we present a timeline to illustrate the evolution of research on learned indexes. Finally, we highlight several open challenges and future research directions in this emerging and highly active field.
- Accessed in 2023. libspatialindex 1.9.3. https://libspatialindex.org/en/latest/
- Accessed in 2023. PostGIS. https://postgis.net/
- Accessed in 2023. PostgreSQL 14.3. https://www.postgresql.org/
- Learned indexes for a google-scale disk-based database. arXiv preprint arXiv:2012.12501 (2020).
- Charu C Aggarwal. 2015. Data classification. In Data Mining. Springer, 285–344.
- Charu C Aggarwal et al. 2018. Neural networks and deep learning. Springer 10, 978 (2018), 3.
- The “AI+ R”-tree: An Instance-optimized R-tree. In 2022 23rd IEEE International Conference on Mobile Data Management (MDM). IEEE, 9–18.
- A Tutorial on Learned Multi-dimensional Indexes. In Proceedings of the 28th International Conference on Advances in Geographic Information Systems. 1–4.
- On the Suitability of Neural Networks as Building Blocks for the Design of Efficient Learned Indexes. In International Conference on Engineering Applications of Neural Networks. Springer, 115–127.
- Neural networks as building blocks for the design of efficient learned indexes. Neural Computing and Applications 35, 29 (2023), 21399–21414.
- Mikkel Møller Andersen and Pinar Tözün. 2022. Micro-architectural analysis of a learned index. In Proceedings of the Fifth International Workshop on Exploiting Artificial Intelligence Techniques for Data Management. 1–12.
- Learned metric index—proposition of learned indexing for unstructured data. Information Systems 100 (2021), 101774.
- The Handwritten Trie: Indexing Electronic Ink. SIGMOD Rec. 24, 2 (May 1995), 151–162.
- Walid G. Aref. 2009. Electronic Ink Indexing. Springer US, Boston, MA, 972–978. https://doi.org/10.1007/978-0-387-39940-9_143
- Data Structures for Data-Intensive Applications: Tradeoffs and Design Guidelines. Foundations and Trends® in Databases 13, 1-2 (2023), 1–168.
- G Phanendra Babu. 1997. Self-organizing neural networks for spatial data. Pattern Recognition Letters 18, 2 (1997), 133–142.
- Rudolf Bayer and Edward McCreight. 1970. Organization and maintenance of large ordered indices. In Proceedings of the 1970 ACM SIGFIDET (Now SIGMOD) Workshop on Data Description, Access and Control. 107–141.
- The R*-tree: an efficient and robust access method for points and rectangles. In Proceedings of the 1990 ACM SIGMOD international conference on Management of data. 322–331.
- Period Index: A Learned 2D Hash Index for Range and Duration Queries. In Proceedings of the 16th International Symposium on Spatial and Temporal Databases. 100–109.
- Michael A Bender and Haodong Hu. 2007. An adaptive packed-memory array. ACM Transactions on Database Systems (TODS) 32, 4 (2007), 26–es.
- Jon Louis Bentley. 1975. Multidimensional Binary Search Trees Used for Associative Searching. Commun. ACM 18, 9 (1975), 509–517.
- Jon Louis Bentley and Jerome H Friedman. 1979. Data structures for range searching. ACM Computing Surveys (CSUR) 11, 4 (1979), 397–409.
- Adaptive learned bloom filters under incremental workloads. In Proceedings of the 7th ACM IKDD CoDS and 25th COMAD. 107–115.
- Towards a Benchmark for Learned Systems. In 2021 IEEE 37th International Conference on Data Engineering Workshops (ICDEW). IEEE, 127–133.
- Burton H Bloom. 1970. Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13, 7 (1970), 422–426.
- Léon Bottou. 1998. Online algorithms and stochastic approximations. Online learning in neural networks (1998).
- Leo Breiman. 2001. Random forests. Machine learning 45 (2001), 5–32.
- A survey of monte carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in games 4, 1 (2012), 1–43.
- Data management for machine learning: A survey. IEEE Transactions on Knowledge and Data Engineering 35, 5 (2022), 4646–4667.
- Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS) 26, 2 (2008), 1–26.
- Learned Index with Dynamic ϵitalic-ϵ\epsilonitalic_ϵ. In The Eleventh International Conference on Learning Representations.
- Lianhua Chi and Xingquan Zhu. 2017. Hashing techniques: A survey and taxonomy. ACM Computing Surveys (Csur) 50, 1 (2017), 1–36.
- Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).
- Waffle: in-memory grid index for moving objects with reinforcement learning-based configuration tuning system. Proceedings of the VLDB Endowment 15, 11 (2022), 2375–2388.
- Douglas Comer. 1979. Ubiquitous B-tree. ACM Computing Surveys (CSUR) 11, 2 (1979), 121–137.
- Andrew Crotty. 2021. Hist-Tree: Those Who Ignore It Are Doomed to Learn.. In CIDR.
- DiffLex: A High-Performance, Memory-Efficient and NUMA-Aware Learned Index using Differentiated Management. In Proceedings of the 52nd International Conference on Parallel Processing. 62–71.
- From {{\{{WiscKey}}\}} to Bourbon: A Learned Index for {{\{{Log-Structured}}\}} Merge Trees. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). 155–171.
- Zhenwei Dai and Anshumali Shrivastava. 2019. Adaptive learned Bloom filter (Ada-BF): Efficient utilization of the classifier. arXiv preprint arXiv:1910.09131 (2019).
- Compressing (Multidimensional) Learned Bloom Filters. In Workshop on Databases and AI.
- The ML-Index: A Multidimensional, Learned Index for Point, Range, and Nearest-Neighbor Queries.. In EDBT. 407–410.
- SageDB: An Instance-Optimized Data Analytics System. Proceedings of the VLDB Endowment 15, 13 (2022), 4062–4078.
- Instance-Optimized Data Layouts for Cloud Analytics Workloads. In SIGMOD Conference. ACM, 418–431.
- ALEX: an updatable adaptive learned index. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 969–984.
- Tsunami: A Learned Multi-dimensional Index for Correlated Data and Skewed Workloads. Proc. VLDB Endow. 14, 2 (2020), 74–86.
- A learned spatial textual index for efficient keyword queries. Journal of Intelligent Information Systems 60, 3 (2023), 803–827.
- An Error-Bounded Space-Efficient Hybrid Learned Index with High Lookup Performance. In International Conference on Database and Expert Systems Applications. Springer, 216–228.
- RW-Tree: A Learned Workload-aware Framework for R-tree Construction. In 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 2073–2085.
- New trends in high-d vector similarity search: al-driven, progressive, and distributed. Proceedings of the VLDB Endowment 14, 12 (2021), 3198–3201.
- A Tailored Regression for Learned Indexes: Logarithmic Error Regression. In Fourth Workshop in Exploiting AI Techniques for Data Management. 9–15.
- Summary cache: a scalable wide-area web cache sharing protocol. IEEE/ACM transactions on networking 8, 3 (2000), 281–293.
- Vector Approximation based Indexing for Non-uniform High Dimensional Data Sets. In CIKM. ACM, 202–209.
- On nonlinear learned string indexing. IEEE Access (2023).
- Why are learned indexes so effective?. In International Conference on Machine Learning. PMLR, 3123–3132.
- On the performance of learned data structures. Theoretical Computer Science 871 (2021), 107–120.
- Paolo Ferragina and Giorgio Vinciguerra. 2020a. Learned data structures. In Recent Trends in Learning From Data. Springer, 5–41.
- Paolo Ferragina and Giorgio Vinciguerra. 2020b. The PGM-index: a fully-dynamic compressed learned index with provable worst-case bounds. Proceedings of the VLDB Endowment 13, 8 (2020), 1162–1175.
- Edward Fredkin. 1960. Trie memory. Commun. ACM 3, 9 (1960), 490–499.
- Volker Gaede and Oliver Günther. 1998. Multidimensional access methods. ACM Computing Surveys (CSUR) 30, 2 (1998), 170–231.
- FITing-Tree: A Data-aware Index Structure. In SIGMOD Conference. ACM, 1189–1206.
- LMSFC: A Novel Multidimensional Index Based on Learned Monotonic Space Filling Curves. Proc. VLDB Endow. 16, 10 (aug 2023), 2605–2617.
- Cutting Learned Index into Pieces: An In-depth Inquiry into Updatable Learned Indexes. In 2023 IEEE 39th International Conference on Data Engineering (ICDE). IEEE, 315–327.
- SALI: A Scalable Adaptive Learned Index Framework based on Probability Models. arXiv preprint arXiv:2308.15012 (2023).
- UCR-STAR: The UCR spatio-temporal active repository. SIGSPATIAL Special 11, 2 (2019), 34–40.
- Goetz Graefe. 2006. B-tree indexes, interpolation search, and skew. In Proceedings of the 2nd international workshop on Data management on new hardware. 5–es.
- A survey of actor-critic reinforcement learning: Standard and natural policy gradients. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 42, 6 (2012), 1291–1307.
- The RLR-Tree: A Reinforcement Learning Based R-Tree for Spatial Data. Proceedings of the ACM on Management of Data 1, 1 (2023), 1–26.
- TALI: An Update-Distribution-Aware Learned Index for Social Media Data. Mathematics 10, 23 (2022), 4507.
- Antonin Guttman. 1984. R-trees: A dynamic index structure for spatial searching. In Proceedings of the 1984 ACM SIGMOD international conference on Management of data. 47–57.
- COAX: Correlation-Aware Indexing. In 2023 IEEE 39th International Conference on Data Engineering Workshops (ICDEW). IEEE, 55–59.
- Ali Hadian and Thomas Heinis. 2019a. Considerations for handling updates in learned index structures. In Proceedings of the Second International Workshop on Exploiting Artificial Intelligence Techniques for Data Management. ACM, 3.
- Ali Hadian and Thomas Heinis. 2019b. Interpolation-friendly B-trees: Bridging the Gap Between Algorithmic and LearnedIndexes. (2019).
- Ali Hadian and Thomas Heinis. 2020. Madex: Learning-augmented algorithmic index structures. In Proceedings of the 2nd International Workshop on Applied AI for Database Systems and Applications.
- Ali Hadian and Thomas Heinis. 2021. Shift-Table: A low-latency learned index for range queries using model correction. arXiv preprint arXiv:2101.10457 (2021).
- Hands-off Model Integration in Spatial Index Structures. In Proceedings of the 2nd International Workshop on Applied AI for Database Systems and Applications.
- Multilabel classification. In Multilabel Classification. Springer, 17–31.
- David Hilbert. 1891. Ueber die stetige Abbildung einer Linie auf ein Flächenstück. Math. Ann. 38 (1891), 459–460.
- ACR-Tree: Constructing R-Trees Using Deep Reinforcement Learning. In International Conference on Database Systems for Advanced Applications. Springer, 80–96.
- Sequential model-based optimization for general algorithm configuration. In Learning and Intelligent Optimization: 5th International Conference, LION 5, Rome, Italy, January 17-21, 2011. Selected Papers 5. Springer, 507–523.
- Stratos Idreos and Tim Kraska. 2019. From auto-tuning one size fits all to self-designed and learned data-intensive systems. In Proceedings of the 2019 International Conference on Management of Data. 2054–2059.
- CORDS: Automatic discovery of correlations and soft functional dependencies. In Proceedings of the 2004 ACM SIGMOD international conference on Management of data. 647–658.
- iDistance: An adaptive B+-tree based indexing method for nearest neighbor search. ACM Transactions on Database Systems (TODS) 30, 2 (2005), 364–397.
- Data clustering: a review. ACM computing surveys (CSUR) 31, 3 (1999), 264–323.
- Linear regression. In An Introduction to Statistical Learning: With Applications in Python. Springer, 69–134.
- DB4ML-an in-memory database kernel with machine learning support. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 159–173.
- Hui Jin and HV Jagadish. 2002. Indexing Hidden Markov Models for Music Retrieval.. In ISMIR.
- Reinforcement learning: A survey. Journal of artificial intelligence research 4 (1996), 237–285.
- The Case for ML-Enhanced High-Dimensional Indexes. In Proceedings of the 3rd International Workshop on Applied AI for Database Systems and Applications.
- Lsi: A learned secondary index structure. In Proceedings of the Fifth International Workshop on Exploiting Artificial Intelligence Techniques for Data Management. 1–5.
- Adaptive main-memory indexing for high-performance point-polygon joins. (2020).
- SOSD: A Benchmark for Learned Indexes. ArXiv abs/1911.13014 (2019).
- RadixSpline: a single-pass learned index. In Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management. 1–5.
- Teuvo Kohonen. 1990. The self-organizing map. Proc. IEEE 78, 9 (1990), 1464–1480.
- The price of tailoring the index to your data: Poisoning attacks on learned index structures. In Proceedings of the 2022 International Conference on Management of Data. 1331–1344.
- Tim Kraska. 2021. Towards instance-optimized data systems. Proceedings of the VLDB Endowment 14, 12 (2021).
- Sagedb: A learned database system. (2019).
- The case for learned index structures. In Proceedings of the 2018 International Conference on Management of Data. ACM, 489–504.
- ML-In-Databases: Assessment and Prognosis. IEEE Data Eng. Bull. 44, 1 (2021), 3–10.
- Updatable Learned Indexes Meet Disk-Resident DBMS-From Evaluations to Design Choices. Proceedings of the ACM on Management of Data 1, 2 (2023), 1–22.
- Approaching the skyline in Z order.. In VLDB, Vol. 7. 279–290.
- The adaptive radix tree: ARTful indexing for main-memory databases. In 2013 IEEE 29th International Conference on Data Engineering (ICDE). IEEE, 38–49.
- STR: A simple and efficient algorithm for R-tree packing. In Proceedings 13th international conference on data engineering. IEEE, 497–506.
- Guoliang Li and Xuanhe Zhou. 2022. Machine learning for data management: A system view. In 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 3198–3201.
- AI meets database: AI4DB and DB4AI. In Proceedings of the 2021 International Conference on Management of Data. 2859–2866.
- Machine learning for databases. In Proceedings of the First International Conference on AI-ML Systems. 1–2.
- Towards Designing and Learning Piecewise Space-Filling Curves. Proceedings of the VLDB Endowment 16, 9 (2023), 2158–2171.
- A Survey of Multi-Dimensional Indexes: Past and Future Trends. IEEE Transactions on Knowledge and Data Engineering (2024).
- FINEdex: a fine-grained learned index scheme for scalable and concurrent memory systems. Proceedings of the VLDB Endowment 15, 2 (2021), 321–334.
- A Scalable Learned Index Scheme in Storage Systems. CoRR abs/1905.06256 (2019). arXiv:1905.06256
- LISA: A Learned Index Structure for Spatial Data. SIGMOD (2020).
- DILI: A Distribution-Driven Learned Index. 16, 9 (2023), 2212–2224.
- Mlog: Towards declarative in-database machine learning. Proceedings of the VLDB Endowment 10, 12 (2017), 1933–1936.
- ASLM: Adaptive single layer model for learned index. In International Conference on Database Systems for Advanced Applications. Springer, 80–95.
- PolyFit: Polynomial-based indexing approach for fast approximate range aggregate queries. In Advances in Database Technology-24th International Conference on Extending Database Technology, EDBT 2021. OpenProceedings. org, 241–252.
- Learning hash index based on a shallow autoencoder. Applied Intelligence 53, 12 (2023), 14999–15010.
- Efficiently Learning Spatial Indices. In 2023 IEEE 39th International Conference on Data Engineering (ICDE). 1572–1584.
- Efficient Index Learning via Model Reuse and Fine-tuning. In 2023 IEEE 39th International Conference on Data Engineering Workshops (ICDEW). IEEE, 60–66.
- A geohash-based index for spatial data management in distributed memory. In 2014 22Nd international conference on geoinformatics. IEEE, 1–4.
- A Data-aware Learned Index Scheme for Efficient Writes. In Proceedings of the 51st International Conference on Parallel Processing. 1–11.
- HAP: an efficient hamming space index based on augmented pigeonhole principle. In Proceedings of the 2022 International Conference on Management of Data. 917–930.
- Stable learned bloom filters for data streams. Proceedings of the VLDB Endowment 13, 12 (2020), 2355–2367.
- A survey on AI for storage. CCF Transactions on High Performance Computing 4, 3 (2022), 233–264.
- Accelerating b+ tree search by using simple machine learning techniques. In Proceedings of the 1st International Workshop on Applied AI for Database Systems and Applications.
- APEX: a high-performance learned index on persistent memory. Proceedings of the VLDB Endowment 15, 3 (2021), 597–610.
- TridentKV: A Read-Optimized LSM-Tree Based KV Store via Adaptive Indexing and Space-Efficient Partitioning. IEEE Transactions on Parallel and Distributed Systems 33, 8 (2021), 1953–1966.
- Wisckey: Separating keys from values in ssd-conscious storage. ACM Transactions on Storage (TOS) 13, 1 (2017), 1–28.
- Chen Luo and Michael J Carey. 2020. LSM-based storage techniques: a survey. The VLDB Journal 29, 1 (2020), 393–418.
- Film: A fully learned index for larger-than-memory databases. Proceedings of the VLDB Endowment 16, 3 (2022), 561–573.
- Lifting the curse of multidimensional data with learned existence indexes. In Workshop on ML for Systems at NeurIPS. 1–6.
- Marcel Maltry and Jens Dittrich. 2022. A critical analysis of recursive model indexes. Proceedings of the VLDB Endowment 15, 5 (2022), 1079–1091.
- Benchmarking learned indexes. Proceedings of the VLDB Endowment 14, 1 (2020), 1–13.
- Neo: a learned query optimizer. Proceedings of the VLDB Endowment 12, 11 (2019), 1705–1718.
- CDFShop: Exploring and Optimizing Learned Index Structures. SIGMOD (2020).
- Mayank Mishra and Rekha Singhal. 2021. RUSLI: real-time updatable spline learned index. In Fourth Workshop in Exploiting AI Techniques for Data Management. 1–8.
- Lubos Mitas and Helena Mitasova. 1999. Spatial interpolation. Geographical information systems: principles, techniques, management and applications 1, 2 (1999).
- Michael Mitzenmacher. 2018. A Model for Learned Bloom Filters, and Optimizing by Sandwiching. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (Montréal, Canada) (NIPS’18). Curran Associates Inc., Red Hook, NY, USA, 462–471.
- Human-level control through deep reinforcement learning. nature 518, 7540 (2015), 529–533.
- Analysis of multi-dimensional space-filling curves. GeoInformatica 7 (2003), 179–209.
- Robert Morris. 1968. Scatter storage techniques. Commun. ACM 11, 1 (1968), 38–44.
- Guy M Morton. 1966. A computer oriented geodetic data base and a new technique in file sequencing. (1966).
- Learning multi-dimensional indexes. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 985–1000.
- The grid file: An adaptable, symmetric multikey file structure. ACM Transactions on Database Systems (TODS) 9, 1 (1984), 38–71.
- The potential of learned index structures for index compression. In Proceedings of the 23rd Australasian Document Computing Symposium. 1–4.
- Jack A Orenstein and Tim H Merrett. 1984. A class of data structures for associative searching. In Proceedings of the 3rd ACM SIGACT-SIGMOD Symposium on Principles of Database Systems. 181–190.
- Workload-aware and Learned Z-Indexes. arXiv preprint arXiv:2310.04268 (2023).
- Towards an Instance-Optimal Z-Index. AIDB@VLDB (2022).
- The Case for Learned Spatial Indexes. arXiv preprint arXiv:2008.10349 (2020).
- Enhancing In-Memory Spatial Indexing with Learned Search. arXiv preprint arXiv:2309.06354 (2023).
- Self-Driving Database Management Systems.. In CIDR, Vol. 4. 1.
- Peano. 1890. Sur une courbe, qui remplit toute une aire plane. Math. Ann. 36 (1890), 157–160.
- William Pugh. 1990. Skip lists: a probabilistic alternative to balanced trees. Commun. ACM 33, 6 (1990), 668–676.
- Martin L Puterman. 1990. Markov decision processes. Handbooks in operations research and management science 2 (1990), 331–434.
- Effectively learning spatial indices. Proceedings of the VLDB Endowment 13, 12 (2020), 2341–2354.
- Theoretically optimal and empirically efficient r-trees with strong parallelizability. Proceedings of the VLDB Endowment 11, 5 (2018), 621–634.
- Hybrid indexes by exploring traditional B-tree and linear regression. In International Conference on Web Information Systems and Applications. Springer, 601–613.
- J. Ross Quinlan. 1986. Induction of decision trees. Machine learning 1, 1 (1986), 81–106.
- Lawrence Rabiner and Biinghwang Juang. 1986. An introduction to hidden Markov models. ieee assp magazine 3, 1 (1986), 4–16.
- Meta-learning neural bloom filters. In International Conference on Machine Learning. PMLR, 5271–5280.
- Database management systems. Vol. 3. McGraw-Hill New York.
- Ibrahim Sabek and Mohamed F Mokbel. 2019. Machine learning meets big spatial data. Proceedings of the VLDB Endowment 12, 12 (2019), 1982–1985.
- Ibrahim Sabek and Mohamed F Mokbel. 2020. Machine learning meets big spatial data. In 2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, 1782–1785.
- Ibrahim Sabek and Mohamed F Mokbel. 2021. Machine learning meets big spatial data (revised). In 2021 22nd IEEE International Conference on Mobile Data Management (MDM). IEEE, 5–8.
- When Are Learned Models Better Than Hash Functions? arXiv preprint arXiv:2107.01464 (2021).
- Can Learned Models Replace Hash Functions? Proceedings of the VLDB Endowment 16, 3 (2022), 532–545.
- Hans Sagan. 2012. Space-filling curves. Springer Science & Business Media.
- Hanan Samet. 1984. The quadtree and related hierarchical data structures. ACM Computing Surveys (CSUR) 16, 2 (1984), 187–260.
- Hanan Samet. 2006. Foundations of multidimensional and metric data structures. Morgan Kaufmann.
- Meta-learning with memory-augmented neural networks. In International conference on machine learning. PMLR, 1842–1850.
- Atsuki Sato and Yusuke Matsui. 2023. Fast Partitioned Learned Bloom Filter. arXiv preprint arXiv:2306.02846 (2023).
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).
- Indexing density models for incremental learning and anytime classification on data streams. In Proceedings of the 12th international conference on extending database technology: advances in database technology. 311–322.
- Presto: SQL on everything. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, 1802–1813.
- Function interpolation for learned index structures. In Databases Theory and Applications: 31st Australasian Database Conference, ADC 2020, Melbourne, VIC, Australia, February 3–7, 2020, Proceedings 31. Springer, 68–80.
- WISK: A Workload-aware Learned Index for Spatial Keyword Queries. Proceedings of the ACM on Management of Data 1, 2 (2023), 1–27.
- Jin Shieh and Eamonn J. Keogh. 2009. iSAX: disk-aware mining and indexing of massive time series datasets. Data Min. Knowl. Discov. 19, 1 (2009), 24–57.
- Data-driven learned metric index: an unsupervised approach. In International Conference on Similarity Search and Applications. Springer, 81–94.
- Bounding the last mile: Efficient learned string indexing. arXiv preprint arXiv:2111.14905 (2021).
- PLEX: towards practical learned indexing. In 3rd International Workshop on Applied AI for Database Systems and Applications.
- The implementation of POSTGRES. IEEE transactions on knowledge and data engineering 2, 1 (1990), 125–142.
- A Fast Hybrid Spatial Index with External Memory Support. In 2023 IEEE 39th International Conference on Data Engineering Workshops (ICDEW). IEEE, 67–73.
- Learned Index: A Comprehensive Experimental Evaluation. Proceedings of the VLDB Endowment 16, 8 (2023), 1992–2004.
- Learned Indexes for Dynamic Workloads. arXiv preprint arXiv:1902.00655 (2019).
- XIndex: a scalable learned index for multicore data storage. In Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 308–320.
- A learned index for exact similarity search in metric spaces. IEEE Transactions on Knowledge and Data Engineering (2022).
- Sieve: A Learned Data-Skipping Index for Data Analytics. Proceedings of the VLDB Endowment 16, 11 (2023), 3214–3226.
- SNARF: a learning-enhanced range filter. Proceedings of the VLDB Endowment 15, 8 (2022), 1632–1644.
- Partitioned Learned Bloom Filters. In International Conference on Learning Representations.
- Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th international conference on Machine learning. 1096–1103.
- Congying Wang and Jia Yu. 2022. GLIN: A Lightweight Learned Indexing Mechanism for Complex Geometries. arXiv preprint arXiv:2207.07745 (2022).
- Learned Index for Spatial Queries. In 2019 20th IEEE International Conference on Mobile Data Management (MDM). IEEE, 569–574.
- SLBRIN: A Spatial Learned Index Based on BRIN. ISPRS International Journal of Geo-Information 12, 4 (2023), 171.
- Ning Wang and Jianqiu Xu. 2021. Spatial queries based on learned index. In Spatial Data and Intelligence: First International Conference, SpatialDI 2020, Virtual Event, May 8–9, 2020, Proceedings 1. Springer, 245–257.
- Database meets deep learning: Challenges and opportunities. ACM Sigmod Record 45, 2 (2016), 17–22.
- SIndex: a scalable learned index for string keys. In Proceedings of the 11th ACM SIGOPS Asia-Pacific Workshop on Systems. 17–24.
- A Data-adaptive and Dynamic Segmentation Index for Whole Matching on Time Series. Proc. VLDB Endow. 6, 10 (2013), 793–804.
- The concurrent learned indexes for multicore data storage. ACM Transactions on Storage (TOS) 18, 1 (2022), 1–35.
- WIPE: a Write-Optimized Learned Index for Persistent Memory. ACM Transactions on Architecture and Code Optimization (2023).
- Hongwei Wen and Hanyuan Hang. 2022. Random forest density estimation. In International Conference on Machine Learning. PMLR, 23701–23722.
- Are updatable learned indexes ready? Proceedings of the VLDB Endowment 15, 11 (2022), 3004–3017.
- Updatable Learned Index with Precise Positions. Proc. VLDB Endow. 14, 8 (apr 2021), 1276–1288.
- NFL: robust learned index via distribution transformation. Proceedings of the VLDB Endowment 15, 10 (2022), 2188–2200.
- Designing Succinct Secondary Indexing Mechanism by Exploiting Column Correlations. arXiv preprint arXiv:1903.11203 (2019).
- Pavo: A RNN-Based Learned Inverted Index, Supervised or Unsupervised? IEEE Access 7 (2018), 293–303.
- Maximum error-bounded piecewise linear representation for online stream approximation. The VLDB journal 23 (2014), 915–937.
- FLIRT: A Fast Learned Index for Rolling Time frames. (2023).
- Jingyi Yang and Gao Cong. 2023. PLATON: Top-down R-tree Packing with Learned Partition Policy. Proceedings of the ACM on Management of Data 1, 4 (2023), 1–26.
- Qd-tree: Learning Data Layouts for Big Data Analytics. In SIGMOD Conference. ACM, 193–208.
- Ahmad Yasin. 2014. A top-down method for performance analysis and counters architecture. In 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 35–44.
- A Study of Learned KD Tree Based on Learned Index. In 2020 International Conference on Networking and Network Applications (NaNA). IEEE, 355–360.
- Jia Yu and Mohamed Sarwat. 2017. Indexing the pickup and drop-off locations of NYC taxi trips in PostgreSQL–lessons from the road. In Advances in Spatial and Temporal Databases: 15th International Symposium, SSTD 2017, Arlington, VA, USA, August 21–23, 2017, Proceedings 15. Springer, 145–162.
- LIFOSS: a learned index scheme for streaming scenarios. World Wide Web 26, 1 (2023), 501–518.
- The Case for Distance-Bounded Spatial Approximations. In CIDR. www.cidrdb.org.
- Sepanta Zeighami and Cyrus Shahabi. 2023. On Distribution Dependent Sub-Logarithmic Query Time of Learned Indexing. arXiv preprint arXiv:2306.10651 (2023).
- PA-LBF: Prefix-Based and Adaptive Learned Bloom Filter for Spatial Data. International Journal of Intelligent Systems 2023 (2023).
- Two-layer partitioned and deletable deep bloom filter for large-scale membership query. Information Systems 119 (2023), 102267.
- Jiaoyi Zhang and Yihan Gao. 2022. CARMI: a cache-aware learned index with a cost-based construction algorithm. Proceedings of the VLDB Endowment 15, 11 (2022), 2679–2691.
- S3: A scalable in-memory skip-list index for key-value store. Proceedings of the VLDB Endowment 12, 12 (2019), 2183–2194.
- SPRIG: A Learned Spatial Index for Range and kNN Queries. In SSTD. ACM, 96–105.
- Efficient Learned Spatial Index With Interpolation Function Based Learned Model. IEEE Transactions on Big Data (2022).
- SA-LSM: optimize data layout for LSM-tree based storage using survival analysis. Proceedings of the VLDB Endowment 15, 10 (2022), 2161–2174.
- TONE: cutting tail-latency in learned indexes. In Proceedings of the Workshop on Challenges and Opportunities of Efficient and Performant Storage Systems. 16–23.
- Plin: a persistent learned index for non-volatile memory with high performance and instant recovery. Proceedings of the VLDB Endowment 16, 2 (2022), 243–255.
- COLIN: a cache-conscious dynamic learned index with high read/write performance. Journal of Computer Science and Technology 36 (2021), 721–740.
- Learned Index on GPU. In 2022 IEEE 38th International Conference on Data Engineering Workshops (ICDEW). IEEE, 117–122.
- Database meets artificial intelligence: A survey. IEEE Transactions on Knowledge and Data Engineering 34, 3 (2020), 1096–1116.
- Justin Zobel and Alistair Moffat. 2006. Inverted files for text search engines. ACM computing surveys (CSUR) 38, 2 (2006), 6–es.
- A learned prefix bloom filter for spatial data. In International Conference on Database and Expert Systems Applications. Springer, 336–350.