Learning-Augmented Search Data Structures (2402.10457v2)
Abstract: We study the integration of machine learning advice to improve upon traditional data structure designed for efficient search queries. Although there has been recent effort in improving the performance of binary search trees using machine learning advice, e.g., Lin et. al. (ICML 2022), the resulting constructions nevertheless suffer from inherent weaknesses of binary search trees, such as complexity of maintaining balance across multiple updates and the inability to handle partially-ordered or high-dimensional datasets. For these reasons, we focus on skip lists and KD trees in this work. Given access to a possibly erroneous oracle that outputs estimated fractional frequencies for search queries on a set of items, we construct skip lists and KD trees that provably provides the optimal expected search time, within nearly a factor of two. In fact, our learning-augmented skip lists and KD trees are still optimal up to a constant factor, even if the oracle is only accurate within a constant factor. We also demonstrate robustness by showing that our data structures achieves an expected search time that is within a constant factor of an oblivious skip list/KD tree construction even when the predictions are arbitrarily incorrect. Finally, we empirically show that our learning-augmented search data structures outperforms their corresponding traditional analogs on both synthetic and real-world datasets.
- Online metric algorithms with untrusted predictions. ACM Trans. Algorithms, 19(2):19:1–19:34, 2023.
- (optimal) online bipartite matching with degree information. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems, NeurIPS, 2022.
- Online facility location with multiple advice. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems, NeurIPS, pages 4661–4673, 2021.
- Secretary and online matching problems with machine learned advice. Discret. Optim., 48(Part 2):100778, 2023.
- A regression approach to learning-augmented online algorithms. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems, NeurIPS, pages 30504–30517, 2021.
- Online algorithms with multiple predictions. In International Conference on Machine Learning, ICML, pages 582–598, 2022.
- Customizing ML predictions for online algorithms. In Proceedings of the 37th International Conference on Machine Learning, ICML, pages 303–313, 2020.
- Online graph algorithms with predictions. In Proceedings of the 2022 ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 35–66, 2022.
- Working set theorems for routing in self-adjusting skip list networks. In IEEE INFOCOM 2020-IEEE Conference on Computer Communications, pages 2175–2184. IEEE, 2020.
- Kiwi: A key-value map for scalable real-time analytics. ACM Transactions on Parallel Computing (TOPC), 7(3):1–28, 2020.
- On the economics of offline password cracking. In IEEE Symposium on Security and Privacy, SP, Proceedings, pages 853–871. IEEE Computer Society, 2018.
- The primal-dual method for learning augmented algorithms. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems, NeurIPS, 2020.
- CAIDA. The caida ucsd anonymized internet traces. https://www.caida.org/catalog/datasets/passive_dataset, 2016.
- Triangle and four cycle counting with predictions in graph streams. In The Tenth International Conference on Learning Representations, ICLR, 2022.
- Streaming algorithms for support-aware histograms. In International Conference on Machine Learning, ICML, pages 3184–3203, 2022.
- Thomas M Cover. Elements of information theory. John Wiley & Sons, 1999.
- Faster fundamental graph algorithms via learned predictions. In International Conference on Machine Learning, ICML, pages 3583–3602, 2022.
- Faster matchings via learned duals. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems, NeurIPS, pages 10393–10406, 2021.
- Predictive flows for faster ford-fulkerson. In International Conference on Machine Learning, ICML, volume 202, pages 7231–7248, 2023.
- Learning-augmented k𝑘kitalic_k-means clustering. In The Tenth International Conference on Learning Representations, ICLR, 2022.
- Xavier Gabaix. Zipf’s law for cities: an explanation. The Quarterly journal of economics, 114(3):739–767, 1999.
- Learning-augmented algorithms for online linear and semidefinite programming. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems, NeurIPS, 2022.
- C. Torgeson G. Pass, A. Chowdhury. 500k user session collection. https://www.kaggle.com/datasets/dineshydv/aol-user-session-collection-500k, 2006.
- Online algorithms for rent-or-buy with expert advice. In Proceedings of the 36th International Conference on Machine Learning, ICML, pages 2319–2327, 2019.
- A skip-list approach for efficiently processing forecasting queries. Proceedings of the VLDB Endowment, 1(1):984–995, 2008.
- Learning-based frequency estimation algorithms. In 7th International Conference on Learning Representations, ICLR, 2019.
- Efficient security mechanisms for routing protocolsa. In Ndss, 2003.
- David A Huffman. A method for the construction of minimum-redundancy codes. Proceedings of the IRE, 40(9):1098–1101, 1952.
- Online knapsack with frequency predictions. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems, NeurIPS, pages 2733–2743, 2021.
- Learning-based low-rank approximations. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS, pages 7400–7410, 2019.
- Learning-augmented data stream algorithms. In 8th International Conference on Learning Representations, ICLR, 2020.
- Online facility location with predictions. In The Tenth International Conference on Learning Representations, ICLR, 2022.
- Learning-augmented private algorithms for multiple quantile release. In International Conference on Machine Learning, ICML 2023, pages 16344–16376, 2023.
- The case for learned index structures. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference, pages 489–504, 2018.
- Learning predictions for algorithms with predictions. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems, NeurIPS, 2022.
- A skiplist-based concurrent priority queue with minimal memory contention. In Principles of Distributed Systems: 17th International Conference, OPODIS 2013, Nice, France, December 16-18, 2013. Proceedings 17, pages 206–220. Springer, 2013.
- Phast: Hierarchical concurrent log-free skip list for persistent memory. IEEE Transactions on Parallel and Distributed Systems, 33(12):3929–3941, 2022.
- Learning the positions in countsketch. In The Eleventh International Conference on Learning Representations, ICLR, 2023.
- Online scheduling via learned weights. In Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 1859–1877, 2020.
- Learning augmented binary search trees. In International Conference on Machine Learning, ICML, pages 13431–13440, 2022.
- Competitive caching with machine learned advice. J. ACM, 68(4):24:1–24:25, 2021.
- Emergent statistical laws in single-cell transcriptomic data. Physical Review E, 107(4):044403, 2023.
- Michael Mitzenmacher. A model for learned bloom filters and optimizing by sandwiching. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems, NeurIPS, pages 462–471, 2018.
- Algorithms with predictions. In Tim Roughgarden, editor, Beyond the Worst-Case Analysis of Algorithms, pages 646–662. Cambridge University Press, 2020.
- Improved learning-augmented algorithms for k-means and k-medians clustering. In The Eleventh International Conference on Learning Representations, ICLR, 2023.
- Improving online algorithms via ML predictions. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS, pages 9684–9693, 2018.
- Simplified self-adapting skip lists. In International Conference on Intelligent Data Engineering and Automated Learning, pages 126–136. Springer, 2010.
- William Pugh. Concurrent maintenance of skip lists. University of Maryland at College Park, 1990.
- William Pugh. Skip lists: A probabilistic alternative to balanced trees. Commun. ACM, 33(6):668–676, jun 1990.
- Agnar Sandmo. The principal problem in political economy: income distribution in the history of economic thought. In Handbook of income distribution, volume 2, pages 3–65. Elsevier, 2015.
- Uniform bounds for scheduling with job size estimates. In 13th Innovations in Theoretical Computer Science Conference, ITCS, pages 114:1–114:30, 2022.
- Claude Elwood Shannon. A mathematical theory of communication. ACM SIGMOBILE mobile computing and communications review, 5(1):3–55, 2001.
- Skiplist-based concurrent priority queues. In Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000, pages 263–268. IEEE, 2000.
- Improved learning-augmented algorithms for the multi-option ski rental problem via best-possible competitive analysis. In International Conference on Machine Learning, ICML, pages 31539–31561, 2023.
- Test of two hypotheses explaining the size of populations in a system of cities. Journal of Applied Statistics, 42(12):2686–2693, 2015.
- Online algorithms for multi-shop ski rental with machine learned advice. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems, NeurIPS, 2020.
- On the implications of zipf’s law in passwords. In Computer Security - ESORICS 2016 - 21st European Symposium on Research in Computer Security, Proceedings, Part I, volume 9878 of Lecture Notes in Computer Science, pages 111–131. Springer, 2016.
- Optimal robustness-consistency trade-offs for learning-augmented online algorithms. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems, NeurIPS, 2020.
- Jellyfish: A fast skip list with mvcc. In Proceedings of the 21st International Middleware Conference, Middleware ’20, page 134–148, New York, NY, USA, 2020. Association for Computing Machinery.
- Febench: A benchmark for real-time relational data feature extraction. Proceedings of the VLDB Endowment, 16(12):3597–3609, 2023.
- S3: A scalable in-memory skip-list index for key-value store. Proc. VLDB Endow., 12(12):2183–2194, aug 2019.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.