Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PCF Learned Sort: a Learning Augmented Sort Algorithm with $O(n \log\log n)$ Expected Complexity (2405.07122v1)

Published 12 May 2024 in cs.DS and cs.CC

Abstract: Sorting is one of the most fundamental algorithms in computer science. Recently, Learned Sorts, which use machine learning to improve sorting speed, have attracted attention. While existing studies show that Learned Sort is experimentally faster than classical sorting algorithms, they do not provide theoretical guarantees about its computational complexity. We propose PCF Learned Sort, a theoretically guaranteed Learned Sort algorithm. We prove that the expected complexity of PCF Learned Sort is $O(n \log \log n)$ under mild assumptions on the data distribution. We also confirm experimentally that PCF Learned Sort has a computational complexity of $O(n \log \log n)$ on both synthetic and real datasets. This is the first study to theoretically support the experimental success of Learned Sort, and provides evidence for why Learned Sort is fast.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. Online metric algorithms with untrusted predictions. ACM Transactions on Algorithms, 2023.
  2. Engineering in-place (shared-memory) sorting algorithms. ACM Transactions on Parallel Computing, 2022.
  3. Sorting with predictions. Advances in Neural Information Processing Systems, 2023.
  4. Faster matchings via learned duals. Advances in Neural Information Processing Systems, 2021.
  5. The pgm-index: a fully-dynamic compressed learned index with provable worst-case bounds. Proceedings of the Very Large Data Bases Endowment, 2020.
  6. Samplesort: A sampling approach to minimal storage tree sorting. Journal of the Association for Computing Machinery, 1970.
  7. Online algorithms for rent-or-buy with expert advice. In Proceedings of the International Conference on Machine Learning, 2019.
  8. Parsimonious learning-augmented caching. In Proceedings of the International Conference on Machine Learning, 2022.
  9. Java. List (java se 21 & jdk 21). URL: https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/util/List.html#sort(java.util.Comparator), 2023. Accessed on 2024-01-18.
  10. Radixspline: a single-pass learned index. In Proceedings of the international workshop on exploiting artificial intelligence techniques for data management, 2020.
  11. The case for learned index structures. In Proceedings of the International Conference on Management of Data, 2018.
  12. Sagedb: A learned database system. In Proceedings of the Conference on Innovative Data Systems Research, 2019.
  13. Kristo, A. Nyc yellow taxi trips dataset. URL: https://doi.org/10.7910/DVN/SSDV7O, 2021.
  14. The case for a learned sorting algorithm. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 2020.
  15. Defeating duplicates: A re-design of the learnedsort algorithm. arXiv preprint arXiv:2107.03290, 2021.
  16. Minimalistic predictions to schedule jobs with online precedence constraints. In Proceedings of the International Conference on Machine Learning, 2023.
  17. Online scheduling via learned weights. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, 2020.
  18. Competitive caching with machine learned advice. Journal of the Association for Computing Machinery, 2021.
  19. Benchmarking learned indexes. Proceedings of the Very Large Data Bases Endowment, 2020.
  20. McIlroy, P. Optimistic sorting and information theoretic complexity. In Proceedings of the ACM-SIAM Symposium on Discrete algorithms, 1993.
  21. Mitzenmacher, M. A model for learned bloom filters and optimizing by sandwiching. Advances in Neural Information Processing Systems, 2018.
  22. Algorithms with predictions. Communications of the ACM, 2022.
  23. Nearly-optimal mergesorts: Fast, practical sorting methods that optimally adapt to existing runs. In European Symposium on Algorithms, 2018.
  24. Musser, D. R. Introspective sorting and selection algorithms. Software: Practice and Experience, 1997.
  25. Deepcache: A deep learning based framework for content caching. In Proceedings of the Workshop on Network Meets AI & ML, 2018.
  26. Theoretically-efficient and practical parallel in-place radix sorting. In Proceedings of the ACM Symposium on Parallelism in Algorithms and Architectures, 2019.
  27. Peters, T. Python: list.sort. URL: https://github.com/python/cpython/blob/main/Objects/listsort.txt, 2002. Accessed on 2024-01-18.
  28. Improving online algorithms via ml predictions. Advances in Neural Information Processing Systems, 2018.
  29. Rohatgi, D. Near-optimal bounds for online caching with machine learned advice. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, 2020.
  30. Discrete-convex-analysis-based framework for warm-starting algorithms with predictions. Advances in Neural Information Processing Systems, 2022.
  31. Fast partitioned learned bloom filter. Advances in Neural Information Processing Systems, 2023.
  32. Improved learning-augmented algorithms for the multi-option ski rental problem via best-possible competitive analysis. In Proceedings of the International Conference on Machine Learning, 2023.
  33. Skarupke, M. I wrote a faster sorting algorithm. URL: https://probablydance.com/2016/12/27/i-wrote-a-faster-sorting-algorithm/, 2016. Accessed on 2024-01-18.
  34. Partitioned learned bloom filter. In International Conference on Learning Representations, 2021.
  35. Sindex: a scalable learned index for string keys. In Proceedings of the ACM SIGOPS Asia-Pacific Workshop on Systems, 2020.
  36. On distribution dependent sub-logarithmic query time of learned indexing. In Proceedings of the International Conference on Machine Learning, 2023.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com