Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Join Sampling under Acyclic Degree Constraints and (Cyclic) Subgraph Sampling (2312.12797v1)

Published 20 Dec 2023 in cs.DB and cs.DS

Abstract: Given a join with an acyclic set of degree constraints, we show how to draw a uniformly random sample from the join result in $O(\mathit{polymat}/ \max {1, \mathrm{OUT} })$ expected time after a preprocessing of $O(\mathrm{IN})$ expected time, where $\mathrm{IN}$, $\mathrm{OUT}$, and $\mathit{polymat}$ are the join's input size, output size, and polymatroid bound, respectively. This compares favorably with the state of the art (Deng et al.\ and Kim et al., both in PODS'23), which states that a uniformly random sample can be drawn in $\tilde{O}(\mathrm{AGM} / \max {1, \mathrm{OUT}})$ expected time after a preprocessing phase of $\tilde{O}(\mathrm{IN})$ expected time, where $\mathrm{AGM}$ is the join's AGM bound. We then utilize our techniques to tackle {\em directed subgraph sampling}. Let $G = (V, E)$ be a directed data graph where each vertex has an out-degree at most $\lambda$, and let $P$ be a directed pattern graph with $O(1)$ vertices. The objective is to uniformly sample an occurrence of $P$ in $G$. The problem can be modeled as join sampling with input size $\mathrm{IN} = \Theta(|E|)$ but, whenever $P$ contains cycles, the converted join has {\em cyclic} degree constraints. We show that it is always possible to throw away certain degree constraints such that (i) the remaining constraints are acyclic and (ii) the new join has asymptotically the same polymatroid bound $\mathit{polymat}$ as the old one. Combining this finding with our new join sampling solution yields an algorithm to sample from the original (cyclic) join (thereby yielding a uniformly random occurrence of $P$) in $O(\mathit{polymat}/ \max {1, \mathrm{OUT}})$ expected time after $O(|E|)$ expected-time preprocessing. We also prove similar results for {\em undirected subgraph sampling} and demonstrate how our techniques can be significantly simplified in that scenario.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Listing 4-cycles. CoRR, abs/2211.10022, 2022.
  2. Foundations of Databases. Addison-Wesley, 1995.
  3. Join synopses for approximate query answering. In Proceedings of ACM Management of Data (SIGMOD), pages 275–286, 1999.
  4. Noga Alon. On the number of subgraphs of prescribed type of graphs with a given number of edges. Israel Journal of Mathematics, 38:116–130, 1981.
  5. Box covers and domain orderings for beyond worst-case join processing. In Proceedings of International Conference on Database Theory (ICDT), pages 3:1–3:23, 2021.
  6. A simple sublinear-time algorithm for counting arbitrary subgraphs via edge sampling. In Innovations in Theoretical Computer Science (ITCS), pages 6:1–6:20, 2019.
  7. Size bounds and query plans for relational joins. SIAM Journal on Computing, 42(4):1737–1767, 2013.
  8. Parameterized aspects of triangle enumeration. Journal of Computer and System Sciences (JCSS), 103:61–77, 2019.
  9. Listing triangles. In Proceedings of International Colloquium on Automata, Languages and Programming (ICALP), pages 223–234, 2014.
  10. On random sampling over joins. In Proceedings of ACM Management of Data (SIGMOD), pages 263–274, 1999.
  11. Yu Chen and Ke Yi. Random sampling and size estimation over cyclic joins. In Proceedings of International Conference on Database Theory (ICDT), pages 7:1–7:18, 2020.
  12. N. Chiba and T. Nishizeki. Arboricity and subgraph listing algorithms. SIAM Journal of Computing, 14(1):210–223, 1985.
  13. Degree sequence bound for join cardinality estimation. In Proceedings of International Conference on Database Theory (ICDT), volume 255, pages 8:1–8:18, 2023.
  14. On join sampling and the hardness of combinatorial output-sensitive join algorithms. In Proceedings of ACM Symposium on Principles of Database Systems (PODS), pages 99–111, 2023.
  15. David Eppstein. Subgraph isomorphism in planar graphs and related problems. J. Graph Algorithms Appl., 3(3):1–27, 1999.
  16. Sampling arbitrary subgraphs exactly uniformly in sublinear time. In Proceedings of International Colloquium on Automata, Languages and Programming (ICALP), pages 45:1–45:13, 2020.
  17. Entropy bounds for conjunctive queries with functional dependencies. In Proceedings of International Conference on Database Theory (ICDT), volume 68, pages 15:1–15:17, 2017.
  18. Finding and listing induced paths and cycles. Discrete Applied Mathematics, 161(4-5):633–641, 2013.
  19. Worst-case optimal binary join algorithms under general ℓpsubscriptℓ𝑝\ell_{p}roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT constraints. CoRR, abs/2112.01003, 2021.
  20. Ce Jin and Yinzhan Xu. Removing additive structure in 3sum-based reductions. In Proceedings of ACM Symposium on Theory of Computing (STOC), pages 405–418, 2023.
  21. It’s all a matter of degree - using degree information to optimize multiway joins. Theory Comput. Syst., 62(4):810–853, 2018.
  22. Join size bounds using lp-norms on degree sequences. CoRR, abs/2306.14075, 2023.
  23. Joins via geometric resolutions: Worst case and beyond. ACM Transactions on Database Systems (TODS), 41(4):22:1–22:45, 2016.
  24. Computing join queries with functional dependencies. In Proceedings of ACM Symposium on Principles of Database Systems (PODS), pages 327–342, 2016.
  25. What do shannon-type inequalities, submodular width, and disjunctive datalog have to do with one another? In Proceedings of ACM Symposium on Principles of Database Systems (PODS), pages 429–444, 2017.
  26. Guaranteeing the O~~𝑂\tilde{O}over~ start_ARG italic_O end_ARG(AGM/OUT) runtime for uniform sampling and size estimation over joins. In Proceedings of ACM Symposium on Principles of Database Systems (PODS), pages 113–125, 2023.
  27. George Manoussakis. Listing all fixed-length simple cycles in sparse graphs in optimal time. In Fundamentals of Computation Theory, pages 355–366, 2017.
  28. Optimal joins using compact data structures. In Proceedings of International Conference on Database Theory (ICDT), volume 155, pages 21:1–21:21, 2020.
  29. On the complexity of the subgraph problem. Commentationes Mathematicae Universitatis Carolinae, 26(2):415–419, 1985.
  30. Hung Q. Ngo. Worst-case optimal join algorithms: Techniques, results, and open problems. In Proceedings of ACM Symposium on Principles of Database Systems (PODS), pages 111–124, 2018.
  31. Beyond worst-case analysis for joins with minesweeper. In Proceedings of ACM Symposium on Principles of Database Systems (PODS), pages 234–245, 2014.
  32. Worst-Case Optimal Join Algorithms: [Extended Abstract]. In Proceedings of ACM Symposium on Principles of Database Systems (PODS), pages 37–48, 2012.
  33. Worst-case optimal join algorithms. Journal of the ACM (JACM), 65(3):16:1–16:40, 2018.
  34. Skew strikes back: new developments in the theory of join algorithms. SIGMOD Rec., 42(4):5–16, 2013.
  35. Alexander Schrijver. Combinatorial Optimization: Polyhedra and Efficiency. Springer-Verlag, 2003.
  36. Dan Suciu. Applications of information inequalities to database theory problems. CoRR, abs/2304.11996, 2023.
  37. Maciej M. Syslo. An efficient cycle vector space algorithm for listing all cycles of a planar graph. SIAM Journal of Computing, 10(4):797–808, 1981.
  38. Todd L. Veldhuizen. Triejoin: A simple, worst-case optimal join algorithm. In Proceedings of International Conference on Database Theory (ICDT), pages 96–106, 2014.
  39. Random sampling over joins revisited. In Proceedings of ACM Management of Data (SIGMOD), pages 1525–1539, 2018.
Citations (1)

Summary

We haven't generated a summary for this paper yet.