Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

GGDMiner -- Discovery of Graph Generating Dependencies for Graph Data Profiling (2403.17082v1)

Published 25 Mar 2024 in cs.DB

Abstract: With the increasing use of graph-structured data, there is also increasing interest in investigating graph data dependencies and their applications, e.g., in graph data profiling. Graph Generating Dependencies (GGDs) are a class of dependencies for property graphs that can express the relation between different graph patterns and constraints based on their attribute similarities. Rich syntax and semantics of GGDs make them a good candidate for graph data profiling. Nonetheless, GGDs are difficult to define manually, especially when there are no data experts available. In this paper, we propose GGDMiner, a framework for discovering approximate GGDs from graph data automatically, with the intention of profiling graph data through GGDs for the user. GGDMiner has three main steps: (1) pre-processing, (2) candidate generation, and, (3) GGD extraction. To optimize memory consumption and execution time, GGDMiner uses a factorized representation of each discovered graph pattern, called Answer Graph. Our results show that the discovered set of GGDs can give an overview about the input graph, both schema level information and also correlations between the graph patterns and attributes.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. Data Profiling. Morgan & Claypool Publishers. https://doi.org/10.2200/S00878ED1V01Y201810DTM052
  2. Answer Graph: Factorization Matters in Large Graphs. In Proceedings of the 24th International Conference on Extending Database Technology, EDBT 2021, Nicosia, Cyprus, March 23 - 26, 2021, Yannis Velegrakis, Demetris Zeinalipour-Yazti, Panos K. Chrysanthis, and Francesco Guerra (Eds.). OpenProceedings.org, 493–498. https://doi.org/10.5441/002/edbt.2021.57
  3. Morteza Alipourlangouri and Fei Chiang. 2022. Discovery of Keys for Graphs [Extended Version]. https://doi.org/10.48550/ARXIV.2205.15547
  4. The LDBC Social Network Benchmark. CoRR abs/2001.02299 (2020). arXiv:2001.02299 http://arxiv.org/abs/2001.02299
  5. Querying Graphs. Morgan & Claypool Publishers. https://doi.org/10.2200/S00873ED1V01Y201808DTM051
  6. Efficient K-Nearest Neighbor Graph Construction for Generic Similarity Measures. In Proceedings of the 20th International Conference on World Wide Web (Hyderabad, India) (WWW ’11). Association for Computing Machinery, New York, NY, USA, 577–586. https://doi.org/10.1145/1963405.1963487
  7. Inclusion Dependency Discovery: An Experimental Evaluation of Thirteen Algorithms. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (Beijing, China) (CIKM ’19). Association for Computing Machinery, New York, NY, USA, 219–228. https://doi.org/10.1145/3357384.3357916
  8. GraMi: Frequent Subgraph and Pattern Mining in a Single Large Graph. Proc. VLDB Endow. 7, 7 (mar 2014), 517–528. https://doi.org/10.14778/2732286.2732289
  9. Discovering Association Rules from Big Graphs. Proc. VLDB Endow. 15, 7 (jun 2022), 1479–1492. https://doi.org/10.14778/3523210.3523224
  10. Discovering Conditional Functional Dependencies. IEEE Transactions on Knowledge and Data Engineering 23, 5 (2011), 683–698. https://doi.org/10.1109/TKDE.2010.154
  11. Discovering Graph Functional Dependencies. ACM Trans. Database Syst. 45, 3, Article 15 (sep 2020), 42 pages. https://doi.org/10.1145/3397198
  12. Wenfei Fan and Ping Lu. 2019. Dependencies for Graphs. ACM Trans. Database Syst. 44, 2, Article 5 (feb 2019), 40 pages. https://doi.org/10.1145/3287285
  13. Association Rules with Graph Patterns. Proc. VLDB Endow. 8, 12 (aug 2015), 1502–1513. https://doi.org/10.14778/2824032.2824048
  14. Fast rule mining in ontological knowledge bases with AMIE+. VLDB J. 24, 6 (2015), 707–730. https://doi.org/10.1007/S00778-015-0394-1
  15. AMIE: Association Rule Mining under Incomplete Evidence in Ontological Knowledge Bases. In Proceedings of the 22nd International Conference on World Wide Web (Rio de Janeiro, Brazil) (WWW ’13). Association for Computing Machinery, New York, NY, USA, 413–422. https://doi.org/10.1145/2488388.2488425
  16. A survey of frequent subgraph mining algorithms. The Knowledge Engineering Review 28, 1 (2013), 75–105. https://doi.org/10.1017/S0269888912000331
  17. String Similarity Joins: An Experimental Evaluation. Proc. VLDB Endow. 7, 8 (apr 2014), 625–636. https://doi.org/10.14778/2732296.2732299
  18. Sebastian Kruse and Felix Naumann. 2018. Efficient Discovery of Approximate Dependencies. Proc. VLDB Endow. 11, 7 (mar 2018), 759–772. https://doi.org/10.14778/3192965.3192968
  19. Efficient Discovery of Differential Dependencies Through Association Rules Mining. In Databases Theory and Applications, Mohamed A. Sharaf, Muhammad Aamir Cheema, and Jianzhong Qi (Eds.). Springer International Publishing, Cham, 3–15.
  20. Certus: An Effective Entity Resolution Approach with Graph Differential Dependencies (GDDs). Proc. VLDB Endow. 12, 6 (feb 2019), 653–666. https://doi.org/10.14778/3311880.3311883
  21. Pass-Join: A Partition-Based Method for Similarity Joins. Proc. VLDB Endow. 5, 3 (nov 2011), 253–264. https://doi.org/10.14778/2078331.2078340
  22. Discover Dependencies from Data—A Review. IEEE Transactions on Knowledge and Data Engineering 24, 2 (2012), 251–264. https://doi.org/10.1109/TKDE.2010.197
  23. POSGRAMI: Possibilistic Frequent Subgraph Mining in a Single Large Graph. In Information Processing and Management of Uncertainty in Knowledge-Based Systems, Joao Paulo Carvalho, Marie-Jeanne Lesot, Uzay Kaymak, Susana Vieira, Bernadette Bouchon-Meunier, and Ronald R. Yager (Eds.). Springer International Publishing, Cham, 549–561.
  24. A Method for Closed Frequent Subgraph Mining in a Single Large Graph. IEEE Access 9 (2021), 165719–165733. https://doi.org/10.1109/ACCESS.2021.3133666
  25. Functional dependency discovery: An experimental evaluation of seven algorithms. Proceedings of the VLDB Endowment 8, 10 (2015), 1082–1093.
  26. Discovery of Approximate (and Exact) Denial Constraints. Proc. VLDB Endow. 13, 3 (nov 2019), 266–278. https://doi.org/10.14778/3368289.3368293
  27. Efficient Discovery of Matching Dependencies. ACM Trans. Database Syst. 45, 3, Article 13 (aug 2020), 33 pages. https://doi.org/10.1145/3392778
  28. GGDs: Graph Generating Dependencies. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management (Virtual Event, Ireland) (CIKM ’20). Association for Computing Machinery, New York, NY, USA, 2217–2220. https://doi.org/10.1145/3340531.3412149
  29. Reasoning on Property Graphs with Graph Generating Dependencies. arXiv:2211.00387 [cs.DB]
  30. Shaoxu Song and Lei Chen. 2011. Differential Dependencies: Reasoning and Discovery. ACM Trans. Database Syst. 36, 3, Article 16 (aug 2011), 41 pages. https://doi.org/10.1145/2000824.2000826
  31. Xifeng Yan and Jiawei Han. 2002. gSpan: graph-based substructure pattern mining. In 2002 IEEE International Conference on Data Mining, 2002. Proceedings. 721–724. https://doi.org/10.1109/ICDM.2002.1184038

Summary

We haven't generated a summary for this paper yet.