Near Uniform Triangle Sampling Over Adjacency List Graph Streams
Abstract: Triangle counting and sampling are two fundamental problems for streaming algorithms. Arguably, designing sampling algorithms is more challenging than their counting variants. It may be noted that triangle counting has received far greater attention in the literature than the sampling variant. In this work, we consider the problem of approximately sampling triangles in different models of streaming with the focus being on the adjacency list model. In this problem, the edges of a graph $G$ will arrive over a data stream. The goal is to design efficient streaming algorithms that can sample and output a triangle from a distribution, over the triangles in $G$, that is close to the uniform distribution over the triangles in $G$. The distance between distributions is measured in terms of $\ell_1$-distance. The main technical contribution of this paper is to design algorithms for this triangle sampling problem in the adjacency list model with the space complexities matching their counting variants. For the sake of completeness, we also show results on the vertex and edge arrival models.
- Finding and Counting Given Length Cycles. Algorithmica, 17(3):209–223, 1997.
- When is approximate counting for conjunctive queries tractable? In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pages 1015–1027, 2021.
- A simple sublinear-time algorithm for counting arbitrary subgraphs via edge sampling. arXiv preprint arXiv:1811.07780, 2018.
- Reductions in streaming algorithms, with an application to counting triangles in graphs. In SODA, volume 2, pages 623–632, 2002.
- S. K. Bera and A. Chakrabarti. Towards Tighter Space Bounds for Counting Triangles and other Substructures in Graph Streams. In Proceedings of the 34th Symposium on Theoretical Aspects of Computer Science, STACS, 2017.
- Towards a decomposition-optimal algorithm for counting and sampling arbitrary motifs in sublinear time. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, 2021.
- Relational graph analysis with real-world constraints: An application in irs tax fraud detection. Grobelnik et al.[63], 2005.
- How hard is counting triangles in the streaming model? In International Colloquium on Automata, Languages, and Programming, pages 244–254. Springer, 2013.
- Triangle Counting in Dynamic Graph Streams. Algorithmica, 76(1):259–278, 2016.
- Counting Triangles in Data Streams. In Proceedings of the 25th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS, pages 253–262, 2006.
- H. Chen and S. Mengel. The logic of counting query answers. In 2017 32nd Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), pages 1–12. IEEE, 2017.
- Y. Chen and K. Yi. Random sampling and size estimation over cyclic joins. In 23rd International Conference on Database Theory (ICDT 2020). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2020.
- G. Cormode and H. Jowhari. A second look at counting triangles in graph streams. Theoretical Computer Science, 552:44–51, 2014.
- D. P. Dubhashi and A. Panconesi. Concentration of Measure for the Analysis of Randomized Algorithms. Cambridge University Press, 2009.
- A. Durand and S. Mengel. Structural tractability of counting of solutions to conjunctive queries. In Proceedings of the 16th International Conference on Database Theory, pages 81–92, 2013.
- Sampling an edge in sublinear time exactly and optimally. In T. Kavitha and K. Mehlhorn, editors, 2023 Symposium on Simplicity in Algorithms, SOSA 2023, Florence, Italy, January 23-25, 2023, pages 253–260. SIAM, 2023.
- Almost optimal bounds for sublinear-time sampling of k-cliques in bounded arboricity graphs. In 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2022.
- T. Eden and W. Rosenbaum. On Sampling Edges Almost Uniformly. In Proceedings of the 1st Symposium on Simplicity in Algorithms, SOSA, volume 61, pages 7:1–7:9, 2018.
- Sampling arbitrary subgraphs exactly uniformly in sublinear time. In A. Czumaj, A. Dawar, and E. Merelli, editors, 47th International Colloquium on Automata, Languages, and Programming, ICALP 2020, July 8-11, 2020, Saarbrücken, Germany (Virtual Conference), volume 168 of LIPIcs, pages 45:1–45:13. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020.
- R. Jayaram and J. Kallaugher. An optimal algorithm for triangle counting in the stream. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2021, August 16-18, 2021, University of Washington, Seattle, Washington, USA (Virtual Conference), volume 207 of LIPIcs, pages 11:1–11:11. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021.
- H. Jowhari and M. Ghodsi. New Streaming Algorithms for Counting Triangles in Graphs. In International Computing and Combinatorics Conference, pages 710–716. Springer, 2005.
- The sketching complexity of graph and hypergraph counting. In M. Thorup, editor, 59th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2018, Paris, France, October 7-9, 2018, pages 556–567. IEEE Computer Society, 2018.
- The complexity of counting cycles in the adjacency list streaming model. In D. Suciu, S. Skritek, and C. Koch, editors, Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019, pages 119–133. ACM, 2019.
- The Complexity of Counting Cycles in the Adjacency List Streaming Model. In Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS, pages 119–133, 2019.
- J. Kallaugher and E. Price. A Hybrid Sampling Scheme for Triangle Counting. In Proceedings of the 28th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA, pages 1778 – 1797, 2017.
- Counting arbitrary subgraphs in data streams. In A. Czumaj, K. Mehlhorn, A. M. Pitts, and R. Wattenhofer, editors, Automata, Languages, and Programming - 39th International Colloquium, ICALP 2012, Warwick, UK, July 9-13, 2012, Proceedings, Part II, volume 7392 of Lecture Notes in Computer Science, pages 598–609. Springer, 2012.
- Efficient Triangle Counting in Large Graphs via Degree-based Vertex Partitioning. Internet Mathematics, 8(1-2):161–185, 2012.
- A. McGregor. Graph Stream Algorithms: A Survey. ACM SIGMOD Record, 43(1):9–20, 2014.
- A. McGregor and S. Vorotnikova. Triangle and Four Cycle Counting in the Data Stream Model. In Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS, pages 445–456, 2020.
- Better Algorithms for Counting Triangles in Data Streams. In Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS, pages 401–411, 2016.
- Network motifs: simple building blocks of complex networks. Science, 298(5594):824–827, 2002.
- S. Muthukrishnan et al. Data streams: Algorithms and applications. Foundations and Trends® in Theoretical Computer Science, 1(2):117–236, 2005.
- R. Pagh and C. E. Tsourakakis. Colorful triangle counting and a mapreduce implementation. Information Processing Letters, 112(7):277–281, 2012.
- Counting and sampling triangles from a graph stream. Proceedings of the VLDB Endowment, 6(14):1870–1881, 2013.
- I. Rivin. Counting cycles and finite dimensional lp norms. Advances in Applied Mathematics, 29(4):647–662, 2002.
- Efficient algorithms for detecting signaling pathways in protein interaction networks. Journal of Computational Biology, 13(2):133–144, 2006.
- A. A. Sherstov. Communication lower bounds using directional derivatives. J. ACM, 61(6):34:1–34:71, 2014.
- Qpath: a method for querying pathways in a protein-protein interaction network. BMC bioinformatics, 7:1–9, 2006.
- J. Tetek and M. Thorup. Edge sampling and graph parameter estimation via vertex neighborhood accesses. In S. Leonardi and A. Gupta, editors, STOC ’22: 54th Annual ACM SIGACT Symposium on Theory of Computing, Rome, Italy, June 20 - 24, 2022, pages 1116–1129. ACM, 2022.
- Subgraph frequencies: Mapping the empirical and extremal geography of large graph collections. In Proceedings of the 22nd international conference on World Wide Web, pages 1307–1318, 2013.
- J. S. Vitter. Random sampling with a reservoir. ACM Trans. Math. Softw., 11(1):37–57, 1985.
- Random sampling over joins revisited. In Proceedings of the 2018 International Conference on Management of Data, pages 1525–1539, 2018.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.