GGDMiner -- Discovery of Graph Generating Dependencies for Graph Data Profiling (2403.17082v1)
Abstract: With the increasing use of graph-structured data, there is also increasing interest in investigating graph data dependencies and their applications, e.g., in graph data profiling. Graph Generating Dependencies (GGDs) are a class of dependencies for property graphs that can express the relation between different graph patterns and constraints based on their attribute similarities. Rich syntax and semantics of GGDs make them a good candidate for graph data profiling. Nonetheless, GGDs are difficult to define manually, especially when there are no data experts available. In this paper, we propose GGDMiner, a framework for discovering approximate GGDs from graph data automatically, with the intention of profiling graph data through GGDs for the user. GGDMiner has three main steps: (1) pre-processing, (2) candidate generation, and, (3) GGD extraction. To optimize memory consumption and execution time, GGDMiner uses a factorized representation of each discovered graph pattern, called Answer Graph. Our results show that the discovered set of GGDs can give an overview about the input graph, both schema level information and also correlations between the graph patterns and attributes.
- Data Profiling. Morgan & Claypool Publishers. https://doi.org/10.2200/S00878ED1V01Y201810DTM052
- Answer Graph: Factorization Matters in Large Graphs. In Proceedings of the 24th International Conference on Extending Database Technology, EDBT 2021, Nicosia, Cyprus, March 23 - 26, 2021, Yannis Velegrakis, Demetris Zeinalipour-Yazti, Panos K. Chrysanthis, and Francesco Guerra (Eds.). OpenProceedings.org, 493–498. https://doi.org/10.5441/002/edbt.2021.57
- Morteza Alipourlangouri and Fei Chiang. 2022. Discovery of Keys for Graphs [Extended Version]. https://doi.org/10.48550/ARXIV.2205.15547
- The LDBC Social Network Benchmark. CoRR abs/2001.02299 (2020). arXiv:2001.02299 http://arxiv.org/abs/2001.02299
- Querying Graphs. Morgan & Claypool Publishers. https://doi.org/10.2200/S00873ED1V01Y201808DTM051
- Efficient K-Nearest Neighbor Graph Construction for Generic Similarity Measures. In Proceedings of the 20th International Conference on World Wide Web (Hyderabad, India) (WWW ’11). Association for Computing Machinery, New York, NY, USA, 577–586. https://doi.org/10.1145/1963405.1963487
- Inclusion Dependency Discovery: An Experimental Evaluation of Thirteen Algorithms. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (Beijing, China) (CIKM ’19). Association for Computing Machinery, New York, NY, USA, 219–228. https://doi.org/10.1145/3357384.3357916
- GraMi: Frequent Subgraph and Pattern Mining in a Single Large Graph. Proc. VLDB Endow. 7, 7 (mar 2014), 517–528. https://doi.org/10.14778/2732286.2732289
- Discovering Association Rules from Big Graphs. Proc. VLDB Endow. 15, 7 (jun 2022), 1479–1492. https://doi.org/10.14778/3523210.3523224
- Discovering Conditional Functional Dependencies. IEEE Transactions on Knowledge and Data Engineering 23, 5 (2011), 683–698. https://doi.org/10.1109/TKDE.2010.154
- Discovering Graph Functional Dependencies. ACM Trans. Database Syst. 45, 3, Article 15 (sep 2020), 42 pages. https://doi.org/10.1145/3397198
- Wenfei Fan and Ping Lu. 2019. Dependencies for Graphs. ACM Trans. Database Syst. 44, 2, Article 5 (feb 2019), 40 pages. https://doi.org/10.1145/3287285
- Association Rules with Graph Patterns. Proc. VLDB Endow. 8, 12 (aug 2015), 1502–1513. https://doi.org/10.14778/2824032.2824048
- Fast rule mining in ontological knowledge bases with AMIE+. VLDB J. 24, 6 (2015), 707–730. https://doi.org/10.1007/S00778-015-0394-1
- AMIE: Association Rule Mining under Incomplete Evidence in Ontological Knowledge Bases. In Proceedings of the 22nd International Conference on World Wide Web (Rio de Janeiro, Brazil) (WWW ’13). Association for Computing Machinery, New York, NY, USA, 413–422. https://doi.org/10.1145/2488388.2488425
- A survey of frequent subgraph mining algorithms. The Knowledge Engineering Review 28, 1 (2013), 75–105. https://doi.org/10.1017/S0269888912000331
- String Similarity Joins: An Experimental Evaluation. Proc. VLDB Endow. 7, 8 (apr 2014), 625–636. https://doi.org/10.14778/2732296.2732299
- Sebastian Kruse and Felix Naumann. 2018. Efficient Discovery of Approximate Dependencies. Proc. VLDB Endow. 11, 7 (mar 2018), 759–772. https://doi.org/10.14778/3192965.3192968
- Efficient Discovery of Differential Dependencies Through Association Rules Mining. In Databases Theory and Applications, Mohamed A. Sharaf, Muhammad Aamir Cheema, and Jianzhong Qi (Eds.). Springer International Publishing, Cham, 3–15.
- Certus: An Effective Entity Resolution Approach with Graph Differential Dependencies (GDDs). Proc. VLDB Endow. 12, 6 (feb 2019), 653–666. https://doi.org/10.14778/3311880.3311883
- Pass-Join: A Partition-Based Method for Similarity Joins. Proc. VLDB Endow. 5, 3 (nov 2011), 253–264. https://doi.org/10.14778/2078331.2078340
- Discover Dependencies from Data—A Review. IEEE Transactions on Knowledge and Data Engineering 24, 2 (2012), 251–264. https://doi.org/10.1109/TKDE.2010.197
- POSGRAMI: Possibilistic Frequent Subgraph Mining in a Single Large Graph. In Information Processing and Management of Uncertainty in Knowledge-Based Systems, Joao Paulo Carvalho, Marie-Jeanne Lesot, Uzay Kaymak, Susana Vieira, Bernadette Bouchon-Meunier, and Ronald R. Yager (Eds.). Springer International Publishing, Cham, 549–561.
- A Method for Closed Frequent Subgraph Mining in a Single Large Graph. IEEE Access 9 (2021), 165719–165733. https://doi.org/10.1109/ACCESS.2021.3133666
- Functional dependency discovery: An experimental evaluation of seven algorithms. Proceedings of the VLDB Endowment 8, 10 (2015), 1082–1093.
- Discovery of Approximate (and Exact) Denial Constraints. Proc. VLDB Endow. 13, 3 (nov 2019), 266–278. https://doi.org/10.14778/3368289.3368293
- Efficient Discovery of Matching Dependencies. ACM Trans. Database Syst. 45, 3, Article 13 (aug 2020), 33 pages. https://doi.org/10.1145/3392778
- GGDs: Graph Generating Dependencies. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management (Virtual Event, Ireland) (CIKM ’20). Association for Computing Machinery, New York, NY, USA, 2217–2220. https://doi.org/10.1145/3340531.3412149
- Reasoning on Property Graphs with Graph Generating Dependencies. arXiv:2211.00387 [cs.DB]
- Shaoxu Song and Lei Chen. 2011. Differential Dependencies: Reasoning and Discovery. ACM Trans. Database Syst. 36, 3, Article 16 (aug 2011), 41 pages. https://doi.org/10.1145/2000824.2000826
- Xifeng Yan and Jiawei Han. 2002. gSpan: graph-based substructure pattern mining. In 2002 IEEE International Conference on Data Mining, 2002. Proceedings. 721–724. https://doi.org/10.1109/ICDM.2002.1184038