Online List Labeling with Predictions (2305.10536v2)
Abstract: A growing line of work shows how learned predictions can be used to break through worst-case barriers to improve the running time of an algorithm. However, incorporating predictions into data structures with strong theoretical guarantees remains underdeveloped. This paper takes a step in this direction by showing that predictions can be leveraged in the fundamental online list labeling problem. In the problem, n items arrive over time and must be stored in sorted order in an array of size Theta(n). The array slot of an element is its label and the goal is to maintain sorted order while minimizing the total number of elements moved (i.e., relabeled). We design a new list labeling data structure and bound its performance in two models. In the worst-case learning-augmented model, we give guarantees in terms of the error in the predictions. Our data structure provides strong guarantees: it is optimal for any prediction error and guarantees the best-known worst-case bound even when the predictions are entirely erroneous. We also consider a stochastic error model and bound the performance in terms of the expectation and variance of the error. Finally, the theoretical results are demonstrated empirically. In particular, we show that our data structure has strong performance on real temporal data sets where predictions are constructed from elements that arrived in the past, as is typically done in a practical use case.
- Anti-persistence on persistent storage: History-independent sparse tables and dictionaries. In Proc. 35th ACM Symposium on Principles of Database Systems (PODS), pages 289–302, 2016.
- Two simplified algorithms for maintaining order in a list. In Proc. 10th Annual European Symposium on Algorithms (ESA), pages 152–164. Springer, 2002.
- Online list labeling: Breaking the log3/2nsuperscript32𝑛\log^{3/2}nroman_log start_POSTSUPERSCRIPT 3 / 2 end_POSTSUPERSCRIPT italic_n barrier. In Proc. 63rd Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 980–990. IEEE, 2022.
- Cache-oblivious B-trees. In Proc. 41st Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 399–409, 2000.
- Insertion sort is O(n log n). Theory of Computing Systems (TOCS), 39:391–397, 2006.
- Concurrent cache-oblivious B-trees. In Proc. of the 17th Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pages 228–237, 2005.
- An adaptive packed-memory array. ACM Transactions on Database Systems (TODS), 32(4):26:1–26:43, 2007.
- Daisy bloom filters. arXiv preprint arXiv:2205.14894, 2022.
- A learned approach to design compressed rank/select data structures. ACM Transactions on Algorithms (TALG), 18(3):1–28, 2022.
- Tight lower bounds for the online labeling problem. In Proc. 44th Annual ACM Symposium on Theory of Computing (STOC), pages 1185–1198, 2012.
- On randomized online labeling with polynomially many labels. In Proc. 40th International Colloquium Automata, Languages, and Programming (ICALP), pages 291–302. Springer, 2013.
- Jingbang Chen and Li Chen. On the power of learning-augmented BSTs. arXiv preprint arXiv:2211.09251, 2022.
- Faster fundamental graph algorithms via learned predictions. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvári, Gang Niu, and Sivan Sabato, editors, Proc. 39th Annual International Conference on Machine Learning, (ICML), volume 162 of Proceedings of Machine Learning Research, pages 3583–3602. PMLR, 2022.
- Friendship and mobility: user movement in location-based social networks. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1082–1090, 2011.
- Predictive flows for faster ford-fulkerson. In Proc. 40th International Conference on Machine Learning, (ICML). PMLR, 2023.
- Fast concurrent reads and updates with pmas. In Prov. 2nd Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA), pages 1–8, 2019.
- Packed memory arrays-rewired. In Proc. 35th IEEE International Conference on Data Engineering (ICDE), pages 830–841. IEEE, 2019.
- Teseo and the analysis of structural dynamic graphs. Proceedings of the VLDB Endowment, 14(6):1053–1066, 2021.
- Paul F Dietz. Maintaining order in a linked list. In Proc. 14th Annual ACM Symposium on Theory of Computing (STOC), pages 122–127, 1982.
- ALEX: an updatable adaptive learned index. In Proc. 46th Annual ACM International Conference on Management of Data (SIGMOD), pages 969–984, 2020.
- Faster matchings via learned duals. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan, editors, Proc. 34th Annual Conference on Neural Information Processing Systems (NeurIPS), pages 10393–10406, 2021.
- Putting the “learning” into learning-augmented algorithms for frequency estimation. In Proc. 38th Annual International Conference on Machine Learning (ICML), pages 2860–2869. PMLR, 2021.
- A packed memory array to keep moving particles sorted. In Proc. 9th Workshop on Virtual Reality Interaction and Physical Simulation (VRIPHYS), pages 69–77. The Eurographics Association, 2012.
- Learned monotone minimal perfect hashing. arXiv preprint arXiv:2304.11012, 2023.
- Why are learned indexes so effective? In Proc. 37th International Conference on Machine Learning (ICML), pages 3123–3132. PMLR, 2020.
- On the performance of learned data structures. Theoretical Computer Science (TCS), 871:107–120, 2021.
- Repetition and linearity-aware rank/select dictionaries. In Proc. 32nd International Symposium on Algorithms and Computation (ISAAC). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2021.
- FITing-tree: A data-aware index structure. In Proc. of the 19th Annual International Conference on Management of Data (ICDM), pages 1189–1206, 2019.
- Learning-based frequency estimation algorithms. In Proc. 7th Annual International Conference on Learning Representations (ICLR), 2019.
- A sparse table implementation of priority queues. Proc. 8th Annual International Colloquium on Automata, Languages, and Programming (ICALP), pages 417–431, 1981.
- Fast and scalable inequality joins. The VLDB Journal, 26(1):125–150, 2017.
- Sosd: A benchmark for learned indexes. arXiv preprint arXiv:1911.13014, 2019.
- Radixspline: a single-pass learned index. In Proc. 3rd International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, pages 1–5, 2020.
- Tsvi Kopelowitz. On-line indexing for general alphabets via predecessor queries on subsets of an ordered list. In Proc. 53rd Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 283–292. IEEE, 2012.
- The case for learned index structures. In Gautam Das, Christopher M. Jermaine, and Philip A. Bernstein, editors, Proc. 44th Annual International Conference on Management of Data, (SIGMOD), pages 489–504. ACM, 2018.
- Predicting dynamic embedding trajectory in temporal interaction networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 1269–1278. ACM, 2019.
- Online scheduling via learned weights. In Shuchi Chawla, editor, Proc. 31st annual ACM Symposium on Discrete Algorithms, (SODA), pages 1859–1877. SIAM, 2020.
- SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, June 2014.
- Learning augmented binary search trees. In Proc. 35th International Conference on Machine Learning (ICML), pages 13431–13440. PMLR, 2022.
- Cdfshop: Exploring and optimizing learned index structures. In Proc. 46th Annual International Conference on Management of Data (SIGMOD), pages 2789–2792, 2020.
- Michael Mitzenmacher. A model for learned bloom filters and optimizing by sandwiching. Proc. 31st Annual Conference on Neural Information Processing Systems (NeurIPS), 31, 2018.
- Algorithms with predictions. Communications of the ACM (CACM), 65(7):33–35, 2022.
- Terrace: A hierarchical graph container for skewed dynamic graphs. In Proc. 21st International Conference on Management of Data (ICDM), pages 1372–1385, 2021.
- Motifs in temporal networks. In Proceedings of the tenth ACM international conference on web search and data mining, pages 601–610, 2017.
- Vijayshankar Raman. Locality preserving dictionaries: theory & application to clustering in databases. In Proc. 18th ACM Symposium on Principles of Database Systems (PODS), pages 337–345, 1999.
- Tim Roughgarden. Beyond the worst-case analysis of algorithms. Cambridge University Press, 2021.
- Can learned models replace hash functions? Proceedings of the VLDB Endowment, 16(3):532–545, 2022.
- Discrete-convex-analysis-based framework for warm-starting algorithms with predictions. In Proc. 35th Annual Conference on Neural Information Processing Systems (NeurIPS), 2022.
- Partitioned learned bloom filters. In Proc. 9th Annual International Conference on Learning Representations (ICLR), 2021.
- Streaming sparse graphs using efficient dynamic sets. In Proc. 7th IEEE International Conference on Big Data (BigData), pages 284–294. IEEE, 2021.
- Packed compressed sparse row: A dynamic graph representation. In Proc. 22nd IEEE High Performance Extreme Computing Conference (HPEC), pages 1–7. IEEE, 2018.
- A parallel packed memory array to store dynamic graphs. In Proc. 23rd Workshop on Algorithm Engineering and Experiments (ALENEX), pages 31–45. SIAM, 2021.
- Dan E Willard. Inserting and deleting records in blocked sequential files. Bell Labs Tech Reports, Tech. Rep. TM81-45193-5, 1981.
- Dan E Willard. Maintaining dense sequential files in a dynamic environment. In Proc. 14th Annual ACM Symposium on Theory of Computing (STOC), pages 114–121, 1982.
- Dan E Willard. Good worst-case algorithms for inserting and deleting records in dense sequential files. In ACM SIGMOD Record, volume 15:2, pages 251–260, 1986.
- Dan E Willard. A density control algorithm for doing insertions and deletions in a sequentially ordered file in a good worst-case time. Information and Computation, 97(2):150–204, 1992.
- Updatable learned index with precise positions. Proceedings of the VLDB Endowment, 14(8):1276–1288, 2021.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.