Naively Sorting Evolving Data is Optimal and Robust (2404.08162v3)
Abstract: We study sorting in the evolving data model, introduced by [AKMU11], where the true total order changes while the sorting algorithm is processing the input. More precisely, each comparison operation of the algorithm is followed by a sequence of evolution steps, where an evolution step perturbs the rank of a random item by a "small" random value. The goal is to maintain an ordering that remains close to the true order over time. Previous works have analyzed adaptations of classic sorting algorithms, assuming that an evolution step changes the rank of an item by just one, and that a fixed constant number $b$ of evolution steps take place between two comparisons. In fact, the only previous result achieving optimal linear total deviation, by [BvDEGJ18a], applies just for $b=1$. We analyze a very simple sorting algorithm suggested by [M14], which samples a random pair of adjacent items in each step and swaps them if they are out of order. We show that the algorithm achieves and maintains, with high probability, optimal total deviation, $O(n)$, and optimal maximum deviation, $O(\log n)$, under very general model settings. Namely, the perturbation introduced by each evolution step is sampled from a general distribution of bounded moment generating function, and we just require that the average number of evolution steps between two sorting steps be bounded by an (arbitrary) constant, where the average is over a linear number of steps. The key ingredients of our proof are a novel potential function argument that inserts "gaps" in the list of items, and a general analysis framework which separates the analysis of sorting from that of the evolution steps, and is applicable to a variety of settings for which previous approaches do not apply. Our results settle conjectures and open problems in the aforementioned works, and provide theoretical support for empirical observations in [BvDEGJ18b].
- Algorithms on evolving graphs. In Proc. Innovations in Theoretical Computer Science, ITCS, pages 149–160, 2012. doi:10.1145/2090236.2090249.
- Sort me if you can: How to sort dynamic data. In Proc. 36th Internatilonal Colloquium on Automata, Languages and Programming, ICALP, pages 339–350, 2009. doi:10.1007/978-3-642-02930-1_28.
- Sorting and selection on dynamic data. Theor. Comput. Sci., 412(24):2564–2576, 2011. doi:10.1016/j.tcs.2010.10.003.
- Community detection on evolving graphs. In Proc. Annual Conference on Neural Information Processing Systems, NIPS, pages 3522–3530, 2016. URL: https://proceedings.neurips.cc/paper_files/paper/2016/file/8698ff92115213ab187d31d4ee5da8ea-Paper.pdf.
- Optimally tracking labels on an evolving tree. In Proc. 34th Canadian Conference on Computational Geometry, CCCG, pages 1–8, 2022. URL: https://www.torontomu.ca/content/dam/canadian-conference-computational-geometry-2022/papers/CCCG2022_paper_51.pdf.
- Mixing times of the biased card shuffling and the asymmetric exclusion process. Trans. American Mathematical Society, 357(8):3013–3029, 2005. URL: http://www.jstor.org/stable/3845086.
- Pagerank on an evolving graph. In Proc. 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD, pages 24–32, 2012. doi:10.1145/2339530.2339539.
- Noisy sorting without resampling. In Proc. Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 268–276, 2008. URL: http://dl.acm.org/citation.cfm?id=1347082.1347112.
- Optimally sorting evolving data. In Proc. 45th International Colloquium on Automata, Languages, and Programming, ICALP, pages 81:1–81:13, 2018. doi:10.4230/LIPIcs.ICALP.2018.81.
- Quadratic time algorithms appear to be optimal for sorting evolving data. In Proc. Twentieth Workshop on Algorithm Engineering and Experiments, ALENEX, pages 87–96, 2018. doi:10.1137/1.9781611975055.8.
- Dynamic graph algorithms. In Algorithms and Theory of Computation Handbook: General Concepts and Techniques. Chapman & Hall/CRC, 2 edition, 2010. doi:10.1201/9781584888239.
- Spearman’s footrule as a measure of disarray. Journal of the Royal Statistical Society: Series B (Methodological), 39(2):262–268, 2018. doi:10.1111/j.2517-6161.1977.tb01624.x.
- Concentration of measure for the analysis of randomized algorithms. Cambridge University Press, Cambridge, 2009. doi:10.1017/CBO9780511581274.
- Analysis of systematic scan Metropolis algorithms using Iwahori-Hecke algebra techniques. Michigan Mathematical Journal, 48(1):157 – 190, 2000. doi:10.1307/mmj/1030132713.
- Efficient densest subgraph computation in evolving graphs. In Proc. 24th International Conference on World Wide Web, WWW, pages 300–310, 2015. doi:10.1145/2736277.2741638.
- Computing with noisy information. SIAM J. Comput., 23(5):1001–1018, 1994. doi:10.1137/S0097539791195877.
- Mark E. Glickman. Parameter estimation in large dynamic paired comparison experiments. Journal of the Royal Statistical Society Series C: Applied Statistics, 48(3):377–394, 01 2002. doi:10.1111/1467-9876.00159.
- New clocks, optimal line formation and self-replication population protocols. In Proc. 40th International Symposium on Theoretical Aspects of Computer Science, STACS, pages 33:1–33:22, 2023. doi:10.4230/LIPICS.STACS.2023.33.
- Optimal bounds for noisy sorting. In Proc. 55th Annual ACM Symposium on Theory of Computing, STOC, pages 1502–1515, 2023. doi:10.1145/3564246.3585131.
- Partial sorting problem on evolving data. Algorithmica, 79(3):960–983, 2017. doi:10.1007/S00453-017-0295-3.
- M. G. Kendall. A new measure of rank correlation. Biometrika, 30(1/2):81–93, 1938. URL: http://www.jstor.org/stable/2332226.
- Stable matching with evolving preferences. In Proc. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM, pages 36:1–36:13, 2016. doi:10.4230/LIPICS.APPROX-RANDOM.2016.36.
- Donald Ervin Knuth. The art of computer programming, Volume III, 2nd Edition. Addison-Wesley, 1998. doi:https://dl.acm.org/doi/10.5555/280635.
- Balanced allocations with incomplete information: The power of two queries. In Proc. 13th Innovations in Theoretical Computer Science Conference, ITCS, pages 103:1–103:23, 2022. doi:10.4230/LIPICS.ITCS.2022.103.
- Mohammad Mahdian. Algorithms on evolving data sets. Talk given at ICERM/Brown Workshop on Stochastic Graph Models, March 17–21, 2014. URL: https://icerm.brown.edu/materials/Slides/sp-s14-w2/Algorithms_on_Evolving_Data_Sets_]_Mohammad_Mahdian,_Google_INC.pdf.
- Agenda: Robust personalized pageranks in evolving graphs. In Proc. 30th ACM International Conference on Information and Knowledge Management, CIKM, pages 1315–1324, 2021. doi:10.1145/3459637.3482317.
- Efficient pagerank tracking in evolving networks. In Proc. 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD, pages 875–884, 2015. doi:10.1145/2783258.2783297.