Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Color: A Framework for Applying Graph Coloring to Subgraph Cardinality Estimation (2405.06767v1)

Published 10 May 2024 in cs.DB

Abstract: Graph workloads pose a particularly challenging problem for query optimizers. They typically feature large queries made up of entirely many-to-many joins with complex correlations. This puts significant stress on traditional cardinality estimation methods which generally see catastrophic errors when estimating the size of queries with only a handful of joins. To overcome this, we propose COLOR, a framework for subgraph cardinality estimation which applies insights from graph compression theory to produce a compact summary that captures the global topology of the data graph. Further, we identify several key optimizations that enable tractable estimation over this summary even for large query graphs. We then evaluate several designs within this framework and find that they improve accuracy by up to 10$3$x over all competing methods while maintaining fast inference, a small memory footprint, efficient construction, and graceful degradation under updates.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. 2024. COLOR Tech Report & Repository. Technical Report. https://anonymous.4open.science/r/Cardinality-with-Colors-4333/README.md
  2. Foundations of modern query languages for graph databases. ACM Computing Surveys (CSUR) 50, 5 (2017), 1–40.
  3. Hannah Bast and Björn Buchhold. 2017. QLever: A Query Engine for Efficient SPARQL+Text Search. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, CIKM 2017, Singapore, November 06 - 10, 2017, Ee-Peng Lim, Marianne Winslett, Mark Sanderson, Ada Wai-Chee Fu, Jimeng Sun, J. Shane Culpepper, Eric Lo, Joyce C. Ho, Debora Donato, Rakesh Agrawal, Yu Zheng, Carlos Castillo, Aixin Sun, Vincent S. Tseng, and Chenliang Li (Eds.). ACM, 647–656. https://doi.org/10.1145/3132847.3132921
  4. Amazon Neptune: Graph Data Management in the Cloud. In Proceedings of the ISWC 2018 Posters & Demonstrations, Industry and Blue Sky Ideas Tracks co-located with 17th International Semantic Web Conference (ISWC 2018), Monterey, USA, October 8th - to - 12th, 2018 (CEUR Workshop Proceedings), Marieke van Erp, Medha Atre, Vanessa López, Kavitha Srinivas, and Carolina Fortuna (Eds.), Vol. 2180. CEUR-WS.org. https://ceur-ws.org/Vol-2180/paper-79.pdf
  5. Demystifying Graph Databases: Analysis and Taxonomy of Data Organization, System Designs, and Graph Queries. ACM Comput. Surv. 56, 2 (2024), 31:1–31:40. https://doi.org/10.1145/3604932
  6. An analytical study of large SPARQL query logs. VLDB J. 29, 2-3 (2020), 655–679. https://doi.org/10.1007/S00778-019-00558-9
  7. Pessimistic Cardinality Estimation: Tighter Upper Bounds for Intermediate Join Cardinalities. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019, Peter A. Boncz, Stefan Manegold, Anastasia Ailamaki, Amol Deshpande, and Tim Kraska (Eds.). ACM, 18–35. https://doi.org/10.1145/3299869.3319894
  8. Xiaowei Chen and John C. S. Lui. 2018. Mining Graphlet Counts in Online Social Networks. ACM Trans. Knowl. Discov. Data 12, 4 (2018), 41:1–41:38. https://doi.org/10.1145/3182392
  9. SafeBound: A Practical System for Generating Cardinality Bounds. Proceedings of the ACM on Management of Data 1, 1 (2023), 1–26.
  10. TigerGraph: A Native MPP Graph Database. CoRR abs/1901.08248 (2019). arXiv:1901.08248 http://arxiv.org/abs/1901.08248
  11. Orri Erling and Ivan Mikhailov. 2009. Virtuoso: RDF Support in a Native RDBMS. In Semantic Web Information Management - A Model-Based Perspective, Roberto De Virgilio, Fausto Giunchiglia, and Letizia Tanca (Eds.). Springer, 501–519. https://doi.org/10.1007/978-3-642-04329-1_21
  12. Cypher: An evolving query language for property graphs. In Proceedings of the 2018 international conference on management of data. 1433–1445.
  13. Martin Grohe. 2017. Descriptive Complexity, Canonisation, and Definable Graph Structure Theory. Lecture Notes in Logic, Vol. 47. Cambridge University Press. https://doi.org/10.1017/9781139028868
  14. Martin Grohe and Daniel Neuen. 2020. Recent Advances on the Graph Isomorphism Problem. CoRR abs/2011.01366 (2020). arXiv:2011.01366 https://arxiv.org/abs/2011.01366
  15. Martin Grohe and Pascal Schweitzer. 2020. The graph isomorphism problem. Commun. ACM 63, 11 (2020), 128–134. https://doi.org/10.1145/3372123
  16. Laura M. Haas. 1999. Review - Access Path Selection in a Relational Database Management System. ACM SIGMOD Digit. Rev. 1 (1999). https://dblp.org/db/journals/dr/Haas99a.html
  17. László Hajdu and Miklós Krész. 2020. Temporal Network Analytics for Fraud Detection in the Banking Sector. In ADBIS, TPDL and EDA 2020 Common Workshops and Doctoral Consortium - International Workshops: DOING, MADEISD, SKG, BBIGAP, SIMPDA, AIMinScience 2020 and Doctoral Consortium, Lyon, France, August 25-27, 2020, Proceedings (Communications in Computer and Information Science), Ladjel Bellatreche, Mária Bieliková, Omar Boussaïd, Barbara Catania, Jérôme Darmont, Elena Demidova, Fabien Duchateau, Mark M. Hall, Tanja Mercun, Boris Novikov, Christos Papatheodorou, Thomas Risse, Oscar Romero, Lucile Sautot, Guilaine Talens, Robert Wrembel, and Maja Zumer (Eds.), Vol. 1260. Springer, 145–157. https://doi.org/10.1007/978-3-030-55814-7_12
  18. Olaf Hartig and Jorge Pérez. 2018. Semantics and complexity of GraphQL. In Proceedings of the 2018 World Wide Web Conference. 1155–1164.
  19. Moe Kayali and Dan Suciu. 2022. Quasi-stable Coloring for Graph Compression: Approximating Max-Flow, Linear Programs, and Centrality. Proc. VLDB Endow. 16, 4 (2022), 803–815. https://www.vldb.org/pvldb/vol16/p803-kayali.pdf
  20. FAQ: Questions Asked Frequently. In Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2016, San Francisco, CA, USA, June 26 - July 01, 2016, Tova Milo and Wang-Chiew Tan (Eds.). ACM, 13–28. https://doi.org/10.1145/2902251.2902280
  21. Combining Sampling and Synopses with Worst-Case Optimal Runtime and Quality Guarantees for Graph Pattern Cardinality Estimation. In SIGMOD ’21: International Conference on Management of Data, Virtual Event, China, June 20-25, 2021, Guoliang Li, Zhanhuai Li, Stratos Idreos, and Divesh Srivastava (Eds.). ACM, 964–976. https://doi.org/10.1145/3448016.3457246
  22. How Good Are Query Optimizers, Really? Proc. VLDB Endow. 9, 3 (2015), 204–215.
  23. Wander Join and XDB: Online Aggregation via Random Walks. ACM Trans. Database Syst. 44, 1 (2019), 2:1–2:41. https://doi.org/10.1145/3284551
  24. Tianyu Liu and Chi Wang. 2020. Understanding the hardness of approximate query processing with joins. arXiv preprint arXiv:2010.00307 (2020).
  25. Wim Martens and Tina Trautner. 2019. Bridging Theory and Practice with Query Log Analysis. SIGMOD Rec. 48, 1 (2019), 6–13. https://doi.org/10.1145/3371316.3371319
  26. Weisfeiler and Leman go Machine Learning: The Story so far. CoRR abs/2112.09992 (2021). arXiv:2112.09992
  27. Inc. Neo4j. 2007. https://neo4j.com/
  28. Thomas Neumann and Guido Moerkotte. 2011. Characteristic sets: Accurate cardinality estimation for RDF queries with multiple joins. In Proceedings of the 27th International Conference on Data Engineering, ICDE 2011, April 11-16, 2011, Hannover, Germany, Serge Abiteboul, Klemens Böhm, Christoph Koch, and Kian-Lee Tan (Eds.). IEEE Computer Society, 984–994. https://doi.org/10.1109/ICDE.2011.5767868
  29. G-CARE: A Framework for Performance Benchmarking of Cardinality Estimation Techniques for Subgraph Matching. In Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14-19, 2020, David Maier, Rachel Pottinger, AnHai Doan, Wang-Chiew Tan, Abdussalam Alawini, and Hung Q. Ngo (Eds.). ACM, 1099–1114. https://doi.org/10.1145/3318464.3389702
  30. G-CARE: A framework for performance benchmarking of cardinality estimation techniques for subgraph matching. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 1099–1114.
  31. Real-time Constrained Cycle Detection in Large Dynamic Graphs. Proc. VLDB Endow. 11, 12 (2018), 1876–1888. https://doi.org/10.14778/3229863.3229874
  32. Emma Rollon and Javier Larrosa. 2011. On Mini-Buckets and the Min-fill Elimination Ordering. In Principles and Practice of Constraint Programming - CP 2011 - 17th International Conference, CP 2011, Perugia, Italy, September 12-16, 2011. Proceedings (Lecture Notes in Computer Science), Jimmy Ho-Man Lee (Ed.), Vol. 6876. Springer, 759–773. https://doi.org/10.1007/978-3-642-23786-7_57
  33. The ubiquity of large graphs and surprising challenges of graph processing: extended survey. VLDB J. 29, 2-3 (2020), 595–618. https://doi.org/10.1007/s00778-019-00548-x
  34. How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benchmarks. In The World Wide Web Conference, WWW 2019, San Francisco, CA, USA, May 13-17, 2019, Ling Liu, Ryen W. White, Amin Mantrach, Fabrizio Silvestri, Julian J. McAuley, Ricardo Baeza-Yates, and Leila Zia (Eds.). ACM, 1623–1633. https://doi.org/10.1145/3308558.3313556
  35. Estimating the Cardinality of Conjunctive Queries over RDF Data Using Graph Summarisation. In Proceedings of the 2018 World Wide Web Conference on World Wide Web, WWW 2018, Lyon, France, April 23-27, 2018, Pierre-Antoine Champin, Fabien Gandon, Mounia Lalmas, and Panagiotis G. Ipeirotis (Eds.). ACM, 1043–1052. https://doi.org/10.1145/3178876.3186003
  36. Shixuan Sun and Qiong Luo. 2020. In-memory subgraph matching: An in-depth study. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 1083–1098.
  37. Join Size Estimation Subject to Filter Conditions. Proc. VLDB Endow. 8, 12 (2015), 1530–1541. https://doi.org/10.14778/2824032.2824051
  38. PRESTO: probabilistic cardinality estimation for RDF queries based on subgraph overlapping. arXiv preprint arXiv:1801.06408 (2018).
  39. Random Sampling over Joins Revisited. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10-15, 2018, Gautam Das, Christopher M. Jermaine, and Philip A. Bernstein (Eds.). ACM, 1525–1539. https://doi.org/10.1145/3183713.3183739
Citations (2)

Summary

We haven't generated a summary for this paper yet.