Improved Algorithms for Maximum Coverage in Dynamic and Random Order Streams (2403.14087v1)
Abstract: The maximum coverage problem is to select $k$ sets from a collection of sets such that the cardinality of the union of the selected sets is maximized. We consider $(1-1/e-\epsilon)$-approximation algorithms for this NP-hard problem in three standard data stream models. 1. {\em Dynamic Model.} The stream consists of a sequence of sets being inserted and deleted. Our multi-pass algorithm uses $\epsilon{-2} k \cdot \text{polylog}(n,m)$ space. The best previous result (Assadi and Khanna, SODA 2018) used $(n +\epsilon{-4} k) \text{polylog}(n,m)$ space. While both algorithms use $O(\epsilon{-1} \log n)$ passes, our analysis shows that when $\epsilon$ is a constant, it is possible to reduce the number of passes by a $1/\log \log n$ factor without incurring additional space. 2. {\em Random Order Model.} In this model, there are no deletions and the sets forming the instance are uniformly randomly permuted to form the input stream. We show that a single pass and $k \text{polylog}(n,m)$ space suffices for arbitrary small constant $\epsilon$. The best previous result, by Warneke et al.~(ESA 2023), used $k2 \text{polylog}(n,m)$ space. 3. {\em Insert-Only Model.} Lastly, our results, along with numerous previous results, use a sub-sampling technique introduced by McGregor and Vu (ICDT 2017) to sparsify the input instance. We explain how this technique and others used in the paper can be implemented such that the amortized update time of our algorithm is polylogarithmic. This also implies an improvement of the state-of-the-art insert only algorithms in terms of the update time: $\text{polylog}(m,n)$ update time suffices whereas the best previous result by Jaud et al.~(SEA 2023) required update time that was linear in $k$.
- Beating two-thirds for random-order streaming matching. In 48th International Colloquium on Automata, Languages and Programming (ICALP), page 19, 2021.
- Stochastic query covering for fast approximate document retrieval. ACM Transactions on Information Systems (TOIS), 33(3):1–35, 2015.
- Online maximum k-coverage. Discrete Applied Mathematics, 160(13-14):1901–1913, 2012.
- Correlation clustering in data streams. In ICML, volume 37 of JMLR Workshop and Conference Proceedings, pages 2237–2246. JMLR.org, 2015.
- Analyzing graph structure via linear measurements. In SODA, pages 459–467. SIAM, 2012.
- Graph sketches: sparsification, spanners, and subgraphs. In PODS, pages 5–14. ACM, 2012.
- Spectral sparsification in dynamic graph streams. In APPROX-RANDOM, volume 8096 of Lecture Notes in Computer Science, pages 1–10. Springer, 2013.
- Tight bounds on the round complexity of the distributed maximum coverage problem. In Artur Czumaj, editor, Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2018, New Orleans, LA, USA, January 7-10, 2018, pages 2412–2431. SIAM, 2018.
- Tight bounds for single-pass streaming complexity of the set cover problem. In STOC, pages 698–711. ACM, 2016.
- Maximum matchings in dynamic graph streams and the simultaneous communication model. In SODA, pages 1345–1364. SIAM, 2016.
- Sepehr Assadi. Tight space-approximation tradeoff for the multi-pass streaming set cover problem. In Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pages 321–335, 2017.
- Submodular secretary problem with shortlists. In Avrim Blum, editor, 10th Innovations in Theoretical Computer Science Conference, ITCS 2019, January 10-12, 2019, San Diego, California, USA, volume 124 of LIPIcs, pages 1:1–1:19. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019.
- Almost optimal streaming algorithms for coverage problems. In Christian Scheideler and Mohammad Taghi Hajiaghayi, editors, Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2017, Washington DC, USA, July 24-26, 2017, pages 13–23. ACM, 2017.
- Space- and time-efficient algorithm for maintaining dense subgraphs on one-pass dynamic streams. In STOC, pages 173–182. ACM, 2015.
- Streaming submodular maximization: Massive data summarization on the fly. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 671–680, 2014.
- Kernelization via sampling with applications to finding matchings and related problems in dynamic graph streams. In SODA, pages 1326–1344. SIAM, 2016.
- Streaming algorithms for submodular function maximization. In ICALP (1), volume 9134 of Lecture Notes in Computer Science, pages 318–330. Springer, 2015.
- Set cover algorithms for very large datasets. In CIKM, pages 479–488. ACM, 2010.
- Incidence geometries and the pass complexity of semi-streaming set cover. In SODA, pages 1365–1373. SIAM, 2016.
- On streaming and communication complexity of the set cover problem. In Fabian Kuhn, editor, Distributed Computing - 28th International Symposium, DISC 2014, Austin, TX, USA, October 12-15, 2014. Proceedings, volume 8784 of Lecture Notes in Computer Science, pages 484–498. Springer, 2014.
- Semi-streaming set cover - (extended abstract). In ICALP (1), volume 8572 of Lecture Notes in Computer Science, pages 453–464. Springer, 2014.
- Uriel Feige. A threshold of ln n for approximating set cover. J. ACM, 45(4):634–652, 1998.
- The one-way communication complexity of submodular maximization with applications to streaming and robustness. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, pages 1363–1374, 2020.
- Stream order and order statistics: Quantile estimation in random-order streams. SIAM Journal on Computing, 38(5):2044–2059, 2009.
- Vertex and hyperedge connectivity in dynamic graph streams. In PODS, pages 241–247. ACM, 2015.
- Towards tight bounds for the streaming set cover problem. In PODS, pages 371–383. ACM, 2016.
- Analysis of the greedy approach in problems of maximum k-coverage. Naval Research Logistics (NRL), 45(6):615–627, 1998.
- Tight trade-offs for the maximum k𝑘kitalic_k-coverage problem in the general streaming model. In 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pages 200–217, 2019.
- Tight bounds for lp samplers, finding duplicates in streams, and related problems. In Maurizio Lenzerini and Thomas Schwentick, editors, Proceedings of the 30th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2011, June 12-16, 2011, Athens, Greece, pages 49–58. ACM, 2011.
- Maximum Coverage in Sublinear Space, Faster. In 21st International Symposium on Experimental Algorithms (SEA 2023), volume 265, pages 21:1–21:20, Dagstuhl, Germany, 2023.
- Richard M Karp. Reducibility among combinatorial problems. In Complexity of Computer Computations, pages 85–103. Springer, 1972.
- Near-optimal observation selection using submodular functions. In AAAI, pages 1650–1654. AAAI Press, 2007.
- Submodular function maximization. In Lucas Bordeaux, Youssef Hamadi, and Pushmeet Kohli, editors, Tractability: Practical Approaches to Hard Problems, pages 71–104. Cambridge University Press, 2014.
- Set cover in the one-pass edge-arrival streaming model. In Floris Geerts, Hung Q. Ngo, and Stavros Sintos, editors, Proceedings of the 42nd ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2023, Seattle, WA, USA, June 18-23, 2023, pages 127–139. ACM, 2023.
- Maximizing the spread of influence through a social network. Theory of Computing, 11:105–147, 2015.
- Single pass spectral sparsification in dynamic streams. In FOCS, pages 561–570. IEEE Computer Society, 2014.
- Maximum matching in semi-streaming with few passes. In Approximation, Randomization, and Combinatorial Optimization: Algorithms and Techniques (APPROX/RANDOM), pages 231–242, 2012.
- Fast moment estimation in data streams in optimal space. In Proceedings of the forty-third annual ACM symposium on Theory of computing, pages 745–754, 2011.
- Christian Konrad. Maximum matching in turnstile streams. In ESA, volume 9294 of Lecture Notes in Computer Science, pages 840–852. Springer, 2015.
- Spanners and sparsifiers in dynamic streams. In PODC, pages 272–281. ACM, 2014.
- Lazy and eager approaches for the set cover problem. In Proceedings of the 37th Australasian Computer Science Conference, pages 19–27, 2014.
- Cardinality constrained submodular maximization for random streams. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan, editors, Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pages 6491–6502, 2021.
- Cardinality constrained submodular maximization for random streams. In Advances in Neural Information Processing Systems 34, pages 6491–6502, 2021.
- Andrew McGregor. Graph stream algorithms: a survey. SIGMOD Record, 43(1):9–20, 2014.
- Morteza Monemizadeh. Dynamic submodular maximization. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 9806–9817. Curran Associates, Inc., 2020.
- Maximum coverage in the data stream model: Parameterized and generalized. In Ke Yi and Zhewei Wei, editors, 24th International Conference on Database Theory, ICDT 2021, March 23-26, 2021, Nicosia, Cyprus, volume 186 of LIPIcs, pages 12:1–12:20. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021.
- Densest subgraph in dynamic graph streams. In MFCS (2), volume 9235 of Lecture Notes in Computer Science, pages 472–482. Springer, 2015.
- Better streaming algorithms for the maximum coverage problem. Theory Comput. Syst., 63(7):1595–1619, 2019.
- Better streaming algorithms for the maximum coverage problem. Theory of Computing Systems, 63:1595–1619, 2019.
- Better algorithms for counting triangles in data streams. In PODS, pages 401–411. ACM, 2016.
- Beyond 1/2-approximation for submodular maximization on massive data streams. In Proceedings of the 35th International Conference on Machine Learning, pages 3829–3838, 2018.
- Beyond 1/2-approximation for submodular maximization on massive data streams. In Jennifer G. Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, volume 80 of Proceedings of Machine Learning Research, pages 3826–3835. PMLR, 2018.
- On maximum coverage in the streaming model & application to multi-topic blog-watch. In SDM, pages 697–708. SIAM, 2009.
- Joachim von zur Gathen and Jürgen Gerhard. Modern Computer Algebra. Cambridge University Press, 1999.
- New classes and applications of hash functions. In Proc. 20th Annual IEEE Symposium on Foundations of Computer Science, pages 175–182, 1979.
- Maximum coverage in random-arrival streams. In Inge Li Gørtz, Martin Farach-Colton, Simon J. Puglisi, and Grzegorz Herman, editors, 31st Annual European Symposium on Algorithms, ESA 2023, September 4-6, 2023, Amsterdam, The Netherlands, volume 274 of LIPIcs, pages 102:1–102:15. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2023.
- Set coverage problems in a one-pass data stream. In SDM, pages 758–766. SIAM, 2013.
- Amit Chakrabarti (20 papers)
- Andrew McGregor (35 papers)
- Anthony Wirth (29 papers)