Cardinality Estimation on Hyper-relational Knowledge Graphs (2405.15231v1)
Abstract: Cardinality Estimation (CE) for query is to estimate the number of results without execution, which is an effective index in query optimization. Recently, CE over has achieved great success in knowledge graphs (KGs) that consist of triple facts. To more precisely represent facts, current researchers propose hyper-relational KGs (HKGs) to represent a triple fact with qualifiers, where qualifiers provide additional context to the fact. However, existing CE methods over KGs achieve unsatisfying performance on HKGs due to the complexity of qualifiers in HKGs. Also, there is only one dataset for HKG query cardinality estimation, i.e., WD50K-QE, which is not comprehensive and only covers limited patterns. The lack of querysets over HKG also becomes a bottleneck to comprehensively investigate CE problems on HKGs. In this work, we first construct diverse and unbiased hyper-relational querysets over three popular HKGs for investigating CE. Besides, we also propose a novel qualifier-attached graph neural network (GNN) model that effectively incorporates qualifier information and adaptively combines outputs from multiple GNN layers, to accurately predict the cardinality. Our experiments illustrate that the proposed hyper-relational query encoder outperforms all state-of-the-art CE methods over three popular HKGs on the diverse and unbiased benchmark.
- Query Embedding on Hyper-relational Knowledge Graphs. CoRR abs/2106.08166 (2021). arXiv:2106.08166 https://arxiv.org/abs/2106.08166
- DBpedia: a nucleus for a web of open data. In Proceedings of the 6th International The Semantic Web and 2nd Asian Conference on Asian Semantic Web Conference (Busan, Korea) (ISWC’07/ASWC’07). Springer-Verlag, Berlin, Heidelberg, 722–735.
- Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (Vancouver, Canada) (SIGMOD ’08). Association for Computing Machinery, New York, NY, USA, 1247–1250. https://doi.org/10.1145/1376616.1376746
- An analytical study of large SPARQL query logs. Proceedings of the VLDB Endowment 11 (08 2017). https://doi.org/10.14778/3149193.3149196
- Translating Embeddings for Modeling Multi-relational Data. In Advances in Neural Information Processing Systems, C.J. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Weinberger (Eds.), Vol. 26. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2013/file/1cecc7a77928ca8133fa24680a88d2f9-Paper.pdf
- Protein function prediction via graph kernels. Bioinformatics 21 Suppl 1 (01 2005), i47–56.
- Topological structure analysis of the protein-protein interaction network in budding yeast. Nucleic acids research 31 (06 2003), 2443–50. https://doi.org/10.1093/nar/gkg340
- Pessimistic Cardinality Estimation: Tighter Upper Bounds for Intermediate Join Cardinalities. In Proceedings of the 2019 International Conference on Management of Data (Amsterdam, Netherlands) (SIGMOD ’19). Association for Computing Machinery, New York, NY, USA, 18–35. https://doi.org/10.1145/3299869.3319894
- Accurate summary-based cardinality estimation through the lens of cardinality estimation graphs. Proc. VLDB Endow. 15, 8 (apr 2022), 1533–1545. https://doi.org/10.14778/3529337.3529339
- Xiaowei Chen and John C. S. Lui. 2018. Mining Graphlet Counts in Online Social Networks. ACM Trans. Knowl. Discov. Data 12, 4, Article 41 (apr 2018), 38 pages. https://doi.org/10.1145/3182392
- LMKG: Learned Models for Cardinality Estimation in Knowledge Graphs. CoRR abs/2102.10588 (2021). arXiv:2102.10588 https://arxiv.org/abs/2102.10588
- Message Passing for Hyper-Relational Knowledge Graphs. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, Online, 7346–7359. https://doi.org/10.18653/v1/2020.emnlp-main.596
- Link Prediction on N-ary Relational Data. In The World Wide Web Conference (San Francisco, CA, USA) (WWW ’19). Association for Computing Machinery, New York, NY, USA, 583–593. https://doi.org/10.1145/3308558.3313414
- Combining Sampling and Synopses with Worst-Case Optimal Runtime and Quality Guarantees for Graph Pattern Cardinality Estimation. In Proceedings of the 2021 International Conference on Management of Data (Virtual Event, China) (SIGMOD ’21). Association for Computing Machinery, New York, NY, USA, 964–976. https://doi.org/10.1145/3448016.3457246
- Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1412.6980
- Wander Join: Online Aggregation via Random Walks. In Proceedings of the 2016 International Conference on Management of Data (San Francisco, California, USA) (SIGMOD ’16). Association for Computing Machinery, New York, NY, USA, 615–629. https://doi.org/10.1145/2882903.2915235
- Deeper insights into graph convolutional networks for semi-supervised learning. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence (New Orleans, Louisiana, USA) (AAAI’18/IAAI’18/EAAI’18). AAAI Press, Article 433, 8 pages.
- KBPearl: a knowledge base population system supported by joint entity and relation linking. Proceedings of the VLDB Endowment 13, 7 (2020), 1035–1049.
- Neural-Answering Logical Queries on Knowledge Graphs. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (Virtual Event, Singapore) (KDD ’21). Association for Computing Machinery, New York, NY, USA, 1087–1097. https://doi.org/10.1145/3447548.3467375
- Towards Deeper Graph Neural Networks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (Virtual Event, CA, USA) (KDD ’20). Association for Computing Machinery, New York, NY, USA, 338–348. https://doi.org/10.1145/3394486.3403076
- Local Augmentation for Graph Neural Networks. In Proceedings of the 39th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 162), Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (Eds.). PMLR, 14054–14072. https://proceedings.mlr.press/v162/liu22s.html
- HAHE: Hierarchical Attention for Hyper-Relational Knowledge Graphs in Global and Local Level. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (Eds.). Association for Computational Linguistics, Toronto, Canada, 8095–8107. https://doi.org/10.18653/v1/2023.acl-long.450
- NQE: N-ary Query Embedding for Complex Query Answering over Hyper-Relational Knowledge Graphs. Proceedings of the AAAI Conference on Artificial Intelligence 37, 4 (June 2023), 4543–4551. https://doi.org/10.1609/aaai.v37i4.25576
- Rinku Mathur and Neeru Adlakha. 2016. A graph theoretic model for prediction of reticulation events and phylogenetic networks for DNA sequences. Egyptian Journal of Basic and Applied Sciences 3, 3 (2016), 263–271. https://doi.org/10.1016/j.ejbas.2016.07.004
- Thomas Neumann and Guido Moerkotte. 2011. Characteristic sets: Accurate cardinality estimation for RDF queries with multiple joins. In Proceedings of the 2011 IEEE 27th International Conference on Data Engineering (ICDE ’11). IEEE Computer Society, USA, 984–994. https://doi.org/10.1109/ICDE.2011.5767868
- G-CARE: A Framework for Performance Benchmarking of Cardinality Estimation Techniques for Subgraph Matching. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (Portland, OR, USA) (SIGMOD ’20). Association for Computing Machinery, New York, NY, USA, 1099–1114. https://doi.org/10.1145/3318464.3389702
- Biographical Semi-Supervised Relation Extraction Dataset. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (, Madrid, Spain,) (SIGIR ’22). Association for Computing Machinery, New York, NY, USA, 3121–3130. https://doi.org/10.1145/3477495.3531742
- Eric Prud’hommeaux and Andy Seaborne. 2008. SPARQL Query Language for RDF. W3C Recommendation. http://www.w3.org/TR/rdf-sparql-query/
- Query2box: Reasoning over Knowledge Graphs in Vector Space using Box Embeddings. CoRR abs/2002.05969 (2020). arXiv:2002.05969 https://arxiv.org/abs/2002.05969
- Hongyu Ren and Jure Leskovec. 2020. Beta embeddings for multi-hop logical reasoning in knowledge graphs. In Proceedings of the 34th International Conference on Neural Information Processing Systems (, Vancouver, BC, Canada,) (NIPS ’20). Curran Associates Inc., Red Hook, NY, USA, Article 1654, 11 pages.
- Petar Ristoski and Heiko Paulheim. 2016. RDF2Vec: RDF Graph Embeddings for Data Mining. In The Semantic Web – ISWC 2016: 15th International Semantic Web Conference, Kobe, Japan, October 17–21, 2016, Proceedings, Part I (Kobe, Japan). Springer-Verlag, Berlin, Heidelberg, 498–514. https://doi.org/10.1007/978-3-319-46523-4_30
- Knowledge Graph Embedding for Link Prediction: A Comparative Analysis. ACM Trans. Knowl. Discov. Data 15, 2, Article 14 (jan 2021), 49 pages. https://doi.org/10.1145/3424672
- Tim Schwabe and Maribel Acosta. 2024. Cardinality Estimation over Knowledge Graphs with Embeddings and Graph Neural Networks. Proceedings of the ACM on Management of Data 2, 1 (2024), 1–26.
- Cardinality Estimation of Subgraph Matching: A Filtering-Sampling Approach. arXiv preprint arXiv:2309.15433 (2023).
- Estimating the Cardinality of Conjunctive Queries over RDF Data Using Graph Summarisation. CoRR abs/1801.09619 (2018). arXiv:1801.09619 http://arxiv.org/abs/1801.09619
- Yago: a core of semantic knowledge. In Proceedings of the 16th International Conference on World Wide Web (Banff, Alberta, Canada) (WWW ’07). Association for Computing Machinery, New York, NY, USA, 697–706. https://doi.org/10.1145/1242572.1242667
- NP-Hard Problems. Springer Netherlands, Dordrecht, 253–311. https://doi.org/10.1007/978-94-011-1190-4_5
- Composition-based Multi-Relational Graph Convolutional Networks. CoRR abs/1911.03082 (2019). arXiv:1911.03082 http://arxiv.org/abs/1911.03082
- Denny Vrandečić and Markus Krötzsch. 2014. Wikidata: a free collaborative knowledgebase. Commun. ACM 57, 10 (sep 2014), 78–85. https://doi.org/10.1145/2629489
- Neural Subgraph Counting with Wasserstein Estimator. In Proceedings of the 2022 International Conference on Management of Data (Philadelphia, PA, USA) (SIGMOD ’22). Association for Computing Machinery, New York, NY, USA, 160–175. https://doi.org/10.1145/3514221.3526163
- Link Prediction on N-ary Relational Facts: A Graph-based Approach. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (Eds.). Association for Computational Linguistics, Online, 396–407. https://doi.org/10.18653/v1/2021.findings-acl.35
- A Petri Dish for Histopathology Image Analysis. In Artificial Intelligence in Medicine: 19th International Conference on Artificial Intelligence in Medicine, AIME 2021, Virtual Event, June 15–18, 2021, Proceedings. Springer-Verlag, Berlin, Heidelberg, 11–24. https://doi.org/10.1007/978-3-030-77211-6_2
- On the representation and embedding of knowledge bases beyond binary relations. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (New York, New York, USA) (IJCAI’16). AAAI Press, 1300–1307.
- Shrinking Embeddings for Hyper-Relational Knowledge Graphs. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (Eds.). Association for Computational Linguistics, Toronto, Canada, 13306–13320. https://doi.org/10.18653/v1/2023.acl-long.743
- How Powerful are Graph Neural Networks? CoRR abs/1810.00826 (2018). arXiv:1810.00826 http://arxiv.org/abs/1810.00826
- A Learned Sketch for Subgraph Counting. In Proceedings of the 2021 International Conference on Management of Data (Virtual Event, China) (SIGMOD ’21). Association for Computing Machinery, New York, NY, USA, 2142–2155. https://doi.org/10.1145/3448016.3457289
- Random Sampling over Joins Revisited. In Proceedings of the 2018 International Conference on Management of Data (Houston, TX, USA) (SIGMOD ’18). Association for Computing Machinery, New York, NY, USA, 1525–1539. https://doi.org/10.1145/3183713.3183739
- Narrow the Input Mismatch in Deep Graph Neural Network Distillation. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (, Long Beach, CA, USA,) (KDD ’23). Association for Computing Machinery, New York, NY, USA, 3581–3592. https://doi.org/10.1145/3580305.3599442