Knowledge Graph Extension by Entity Type Recognition (2405.02463v1)
Abstract: Knowledge graphs have emerged as a sophisticated advancement and refinement of semantic networks, and their deployment is one of the critical methodologies in contemporary artificial intelligence. The construction of knowledge graphs is a multifaceted process involving various techniques, where researchers aim to extract the knowledge from existing resources for the construction since building from scratch entails significant labor and time costs. However, due to the pervasive issue of heterogeneity, the description diversity across different knowledge graphs can lead to mismatches between concepts, thereby impacting the efficacy of knowledge extraction. This Ph.D. study focuses on automatic knowledge graph extension, i.e., properly extending the reference knowledge graph by extracting and integrating concepts from one or more candidate knowledge graphs. We propose a novel knowledge graph extension framework based on entity type recognition. The framework aims to achieve high-quality knowledge extraction by aligning the schemas and entities across different knowledge graphs, thereby enhancing the performance of the extension. This paper elucidates three major contributions: (i) we propose an entity type recognition method exploiting machine learning and property-based similarities to enhance knowledge extraction; (ii) we introduce a set of assessment metrics to validate the quality of the extended knowledge graphs; (iii) we develop a platform for knowledge graph acquisition, management, and extension to benefit knowledge engineers practically. Our evaluation comprehensively demonstrated the feasibility and effectiveness of the proposed extension framework and its functionalities through quantitative experiments and case studies.
- Metrics for ranking ontologies. 4th int. In EON Workshop, 15th Int. World Wide Web Conference, 2006.
- Seecont: A new seeding-based clustering approach for ontology matching. In Advances in Databases and Information Systems: 19th East European Conference, ADBIS 2015, Poitiers, France, September 8-11, 2015, Proceedings 19, pages 245–258. Springer, 2015.
- Results of the ontology alignment evaluation initiative 2018. In 13th International Workshop on Ontology Matching co-located with the 17th ISWC (OM 2018), volume 2288, pages 76–116, 2018.
- A clustering-based approach for large-scale ontology matching. In Advances in Databases and Information Systems: 15th International Conference, ADBIS 2011, Vienna, Austria, September 20-23, 2011. Proceedings 15, pages 415–428. Springer, 2011.
- Pykeen 1.0: a python library for training and evaluating knowledge graph embeddings. The Journal of Machine Learning Research, 22(1):3723–3728, 2021.
- Data normalization and standardization: a technical report. Mach Learn Tech Rep, 1(1):1–6, 2014.
- Decision trees in automatic ontology matching. International Journal of Metadata, Semantics and Ontologies, 11(3):180–190, 2016.
- Dbpedia: A nucleus for a web of open data. In The semantic web, pages 722–735. Springer, 2007.
- Easy access to the freebase dataset. In Proceedings of the 23rd International Conference on World Wide Web, pages 95–98, 2014.
- Language and domain aware lightweight ontology matching. Journal of Web Semantics, 43:1–17, 2017.
- Domain-based sense disambiguation in multilingual structured data. In The Diversity Workshop at the European Conference on Artificial Intelligence (ECAI), 2016.
- Ontology matching using convolutional neural networks. In Proceedings of the 12th language resources and evaluation conference, pages 5648–5653, 2020.
- Entity type prediction leveraging graph walks and entity descriptions. In International Semantic Web Conference, pages 392–410. Springer, 2022.
- Cat2type: Wikipedia category embeddings for entity typing in knowledge graphs. In Proceedings of the 11th on Knowledge Capture Conference, pages 81–88, 2021.
- Linked data: The story so far. In Semantic services, interoperability and web applications: emerging concepts, pages 205–227. IGI Global, 2011.
- Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 1247–1250, 2008.
- Constructing and mining web-scale knowledge graphs: Kdd 2014 tutorial. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1967–1967, 2014.
- Log normalization by trend surface analysis. The log analyst, 22(04), 1981.
- Embedding nonground logic programs into autoepistemic logic for knowledge-base combination. ACM Transactions on Computational Logic (TOCL), 12(3):1–39, 2011.
- Applying of machine learning techniques to combine string-based, language-based and structure-based similarity measures for ontology matching. In DAMDID/RCDL, pages 129–147, 2019.
- Ontology search: An empirical evaluation. In The Semantic Web–ISWC 2014: 13th International Semantic Web Conference, Riva del Garda, Italy, October 19-23, 2014. Proceedings, Part II 13, pages 130–147. Springer, 2014.
- Dwrank: Learning concept ranking for ontology search. Semantic Web, 7(4):447–461, 2016.
- Proceedings of the 12th language resources and evaluation conference. In Proceedings of The 12th Language Resources and Evaluation Conference, 2020.
- Coupled semi-supervised learning for information extraction. In Proceedings of the third ACM international conference on Web search and data mining, pages 101–110, 2010.
- String similarity metrics for ontology alignment. In International semantic web conference, pages 294–309. Springer, 2013.
- Identifying mappings among knowledge graphs by formal concept analysis. In OM@ ISWC, pages 25–35, 2019.
- Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785–794, 2016.
- Zinet: Linking chinese characters spanning three thousand years. In Findings of the Association for Computational Linguistics: ACL 2022, pages 3061–3070, 2022.
- Knowledge graph extension with a pre-trained language model via unified learning method. Knowledge-Based Systems, 262:110245, 2023.
- Kenneth Ward Church. Word2vec. Natural Language Engineering, 23(1):155–162, 2017.
- A comparison of string metrics for matching names and records. In Kdd workshop on data cleaning and object consolidation, volume 3, pages 73–78, 2003.
- A formal concept analysis and semantic query expansion cooperation to refine health outcomes of interest. BMC medical informatics and decision making, 15:1–6, 2015.
- Belur V Dasarathy. Nearest neighbor (nn) norms: Nn pattern classification techniques. IEEE Computer Society Tutorial, 1991.
- Auriol Degbelo. A snapshot of ontology evaluation criteria and strategies. In Proceedings of the 13th International Conference on Semantic Systems, pages 1–8, 2017.
- Prompt-learning for fine-grained entity typing. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 6888–6901, 2022.
- Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 601–610, 2014.
- Towards neural schema alignment for openstreetmap and knowledge graphs. In International Semantic Web Conference, pages 56–73. Springer, 2021.
- Biomedical knowledge graph refinement and completion using graph representation learning and top-k similarity measure. In Diversity, Divergence, Dialogue: 16th International Conference, iConference 2021, Beijing, China, March 17–31, 2021, Proceedings, Part I 16, pages 112–123. Springer, 2021.
- Results of the ontology alignment evaluation initiative 2009. In Proc. 4th ISWC workshop on ontology matching (OM), pages 73–126. No commercial editor., 2009.
- Ontology matching, volume 18. Springer, 2007.
- Muhammad Fahad. Initial results for ontology matching workshop 2015 dkp-aom: results for oaei 2015. In CEUR Workshop Proceedings, 2015.
- Ontology pattern languages. In Ontology Engineering with Ontology Design Patterns: Foundations and Applications. IOS Press, 2016.
- A comparative survey of dbpedia, freebase, opencyc, wikidata, and yago. Semantic Web Journal, 1(1):1–5, 2015.
- Agreementmakerlight 2.0: Towards efficient large-scale ontology matching. In ISWC (Posters & Demos), pages 457–460, 2014.
- The agreementmakerlight ontology matching system. In On the Move to Meaningful Internet Systems: OTM 2013 Conferences: Confederated International Conferences: CoopIS, DOA-Trusted Cloud, and ODBASE 2013, Graz, Austria, September 9-13, 2013. Proceedings, pages 527–541. Springer, 2013.
- Christiane Fellbaum. WordNet: An electronic lexical database. MIT press, 1998.
- Steven R Finch. Mathematical constants. Cambridge university press, 2003.
- Ontology-driven cross-domain transfer learning. In Formal Ontology in Information Systems, pages 249–263. IOS Press, 2020.
- Liveschema: A gateway towards learning on knowledge graph schemas. Proceeding of the 13th international conference on formal ontology in information systems., 2023.
- Ranking schemas by focus: A cognitively-inspired approach. In International Conference on Conceptual Structures, pages 73–88. Springer, 2021.
- A theoretical framework for ontology evaluation and validation. In SWAP, volume 166, page 16. Citeseer, 2005.
- B. Ganter and R. Wille. Formal concept analysis: mathematical foundations. Springer, 2012.
- S-match: an open source framework for matching lightweight ontologies. Semantic Web, 3(3):307–317, 2012.
- Teleologies: Objects, actions and functions. In International conference on conceptual modeling, pages 520–534. Springer, 2017.
- On knowledge diversity. In JOWO, 2019.
- On knowledge diversity. Proceedings of the 2019 Joint Ontology Workshops, WOMoCoE, 2019.
- Entity type recognition–dealing with the diversity of knowledge. In Proceedings of the International Conference on Principles of Knowledge Representation and Reasoning, volume 17, pages 414–423, 2020.
- Property-based entity type graph matching. In CEUR Workshop Proceedings, volume 3063, pages 1–12. OM, ISWC 2021, 2021.
- S-match: an algorithm and an implementation of semantic matching. In The Semantic Web: Research and Applications: First European Semantic Web Symposium, ESWS 2004 Heraklion, Crete, Greece, May 10-12, 2004. Proceedings 1, pages 61–75. Springer, 2004.
- Formalizing the get-specific document classification algorithm. In International Conference on Theory and Practice of Digital Libraries, pages 26–37. Springer, 2007.
- Solving semantic ambiguity to improve semantic web based ontology matching. In Proceedings of the 2nd International Conference on Ontology Matching-Volume 304, pages 1–12, 2007.
- Schema. org: evolution of structured data on the web. Communications of the ACM, 59(2):44–51, 2016.
- Stevan Harnad. To cognize is to categorize: Cognition is categorization. In Handbook of categorization in cognitive science, pages 21–54. Elsevier, 2017.
- Knowledge graphs. ACM Computing Surveys (Csur), 54(4):1–37, 2021.
- Multike: a multi-view knowledge graph embedding framework for entity alignment. In OM@ ISWC, pages 189–190, 2019.
- Min max normalization based data perturbation method for privacy protection. International Journal of Computer & Communication Technology, 2(8):45–50, 2011.
- Attentive path combination for knowledge graph completion. In Asian conference on machine learning, pages 590–605. PMLR, 2017.
- Ernesto Jiménez-Ruiz. Logmap family participation in the oaei 2019. In CEUR Workshop Proceedings. CEUR-WS. org, 2019.
- Logmap: Logic-based and scalable ontology matching. In The Semantic Web–ISWC 2011: 10th International Semantic Web Conference, Bonn, Germany, October 23-27, 2011, Proceedings, Part I 10, pages 273–288. Springer, 2011.
- Bangla text document categorization using stochastic gradient descent (sgd) classifier. In 2015 International Conference on Cognitive Computing and Information Processing (CCIP), pages 1–4. IEEE, 2015.
- A framework for sensitivity analysis of decision trees. Central European journal of operations research, 26(1):135–159, 2018.
- Deepalignment: Unsupervised ontology matching with refined word vectors. In Proceedings of the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1-6 June 2018, 2018.
- Grzegorz Kondrak. N-gram similarity and distance. In International symposium on string processing and information retrieval, pages 115–126. Springer, 2005.
- Contextualized embeddings based transformer encoder for sentence similarity modeling in answer selection task. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 5505–5514, 2020.
- End-to-end neural coreference resolution. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2017.
- Ore-a tool for repairing and enriching knowledge bases. In International semantic web conference, pages 177–193. Springer, 2010.
- Dbpedia–a large-scale, multilingual knowledge base extracted from wikipedia. Semantic web, 6(2):167–195, 2015.
- Maurizio Lenzerini. Data integration: A theoretical perspective. In Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 233–246, 2002.
- Fcc: Feature clusters compression for long-tailed visual recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24080–24089, 2023.
- John Li. Lom: A lexicon-based ontology mapping tool. Proceedings of the Performance Metrics for Intelligent Systems (PerMIS, page 2004, 2004.
- Rimom: A dynamic multistrategy ontology alignment framework. IEEE Transactions on Knowledge and data Engineering, 21(8):1218–1232, 2008.
- A survey of exploiting wordnet in ontology matching. In IFIP International Conference on Artificial Intelligence in Theory and Practice, pages 341–350. Springer, 2008.
- Fine-grained entity recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 26, pages 94–100, 2012.
- Reusing ontologies and language components for ontology generation. Data & Knowledge Engineering, 69(4):318–330, 2010.
- Evaluating domain ontologies: Clarification, classification, and challenges. ACM Computing Surveys (CSUR), 52(4):1–44, 2019.
- Ruth Garrett Millikan. Beyond concepts: Unicepts, language, and natural information. Oxford University Press, 2017.
- End-to-end construction of nlp knowledge graph. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 1885–1895, 2021.
- Scalable knowledge harvesting with high precision and high recall. In Proceedings of the fourth ACM international conference on Web search and data mining, pages 227–236, 2011.
- Automated image splicing detection using deep cnn-learned features and ann-based classifier. Signal, Image and Video Processing, 15(7):1601–1608, 2021.
- Ontology alignment using machine learning techniques. International Journal of Computer Science & Information Technology, 3(2):139, 2011.
- On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. In Advances in neural information processing systems, pages 841–848, 2002.
- Ngonga Ngomo. Applying edge-counting semantic similarities to link discovery: Scalability and accuracy. Matching (OM 2020), page 36, 2020.
- A review of relational machine learning for knowledge graphs. Proceedings of the IEEE, 104(1):11–33, 2015.
- A probabilistic-logical framework for ontology matching. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 24, pages 1413–1418, 2010.
- Mapping wordnet to the sumo ontology. In Proceedings of the ieee international knowledge engineering conference, pages 23–26. Citeseer, 2003.
- Ontology alignment based on word embedding and random forest classification. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 557–572. Springer, 2018.
- Ontology alignment based on word embedding and random forest classification. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2018, Dublin, Ireland, September 10–14, 2018, Proceedings, Part I 18, pages 557–572. Springer, 2019.
- Embedding-assisted entity resolution for knowledge graphs. In Second International Workshop on Knowledge Graph Construction, 2021.
- Fine-grained entity typing for domain independent entity linking. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 8576–8583, 2020.
- Mahesh Pal. Random forest classifier for remote sensing classification. International journal of remote sensing, 26(1):217–222, 2005.
- M Palmer and Z Wu. Verb semantics and lexical zhibiao w u. In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, Las Cruces, New Mexico, pages 133–138, 1994.
- Heiko Paulheim. Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic web, 8(3):489–508, 2017.
- Some issues on ontology integration. In Proceedings of the IJCAI, volume 99, pages 7–1. Citeseer, 1999.
- Background knowledge in schema matching: Strategy vs. data. In International Semantic Web Conference, pages 287–303. Springer, 2021.
- Results of the ontology alignment evaluation initiative 2020. In Proceedings of the 15th International Workshop on Ontology Matching (OM 2020), volume 2788, pages 92–138. CEUR-WS, 2020.
- Knowledge graph construction techniques. Journal of computer research and development, 53(3):582–600, 2016.
- Philip Resnik. Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of the 14th international joint conference on Artificial intelligence-Volume 1, pages 448–453, 1995.
- Using wordnet as a knowledge base for measuring semantic similarity between words, 1994.
- Learning concept hierarchies from textual resources for ontologies construction. Expert Systems with Applications, 40(15):5907–5915, 2013.
- Learning string-edit distance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(5):522–532, 1998.
- Okapi at trec-3. Nist Special Publication Sp, 109:109, 1995.
- E. Rosch. Principles of categorization. Concepts: core readings, 189, 1999.
- Family resemblances: Studies in the internal structure of categories. Cognitive psychology, 7(4):573–605, 1975.
- Artificial intelligence a modern approach; pearsoneducation. Artificial Intelligence: A Modern Approach: Pearson Education, 2003.
- A survey of decision tree classifier methodology. IEEE transactions on systems, man, and cybernetics, 21(3):660–674, 1991.
- Learning path recommendation system for programming education based on neural networks. International Journal of Distance Education Technologies (IJDET), 18(1):36–64, 2020.
- Term-weighting approaches in automatic text retrieval. Information processing & management, 24(5):513–523, 1988.
- Jürgen Schmidhuber. Deep learning in neural networks: An overview. Neural networks, 61:85–117, 2015.
- Entity type recognition using an ensemble of distributional semantic models to enhance query understanding. In 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC), volume 1, pages 631–636. IEEE, 2016.
- Integrating heterogeneous information via flexible regularization framework for recommendation. Knowledge and Information Systems, 49:835–859, 2016.
- Recognizing entity types via properties. In Proceeding of the 13th international conference on formal ontology in information systems., 2023.
- A learning path recommendation model based on a multidimensional knowledge graph framework for e-learning. Knowledge-Based Systems, 195:105618, 2020.
- A simple contrastive learning framework for interactive argument pair identification via argument-context extraction. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 10027–10039, 2022.
- Neural architectures for fine-grained entity type classification. In 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017, pages 1271–1280. Association for Computational Linguistics (ACL), 2017.
- Type prediction for efficient coreference resolution in heterogeneous semantic graphs. In 2013 IEEE Seventh International Conference on Semantic Computing, pages 78–85. IEEE, 2013.
- Entity type recognition for heterogeneous semantic graphs. AI Magazine, 36(1):75–86, 2015.
- Reasoning with neural tensor networks for knowledge base completion. Advances in neural information processing systems, 26, 2013.
- Integrating medical terminologies with onions methodology. information modeling and knowledge bases viii, 1998.
- Fca-merge: Bottom-up merging of ontologies. In IJCAI, volume 1, pages 225–230, 2001.
- A comparative evaluation of string similarity metrics for ontology alignment. Journal of Information &Computational Science, 12(3):957–964, 2015.
- Bootstrapping entity alignment with knowledge graph embedding. In IJCAI, volume 18, 2018.
- Entity alignment between knowledge graphs using attribute embeddings. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 297–304, 2019.
- Rényi divergence and kullback-leibler divergence. IEEE Transactions on Information Theory, 60(7):3797–3820, 2014.
- Linked open vocabularies (lov): a gateway to reusable semantic vocabularies on the web. Semantic Web Journal, 8:437–452, January 2017.
- Linked open vocabularies (lov): a gateway to reusable semantic vocabularies on the web. Semantic Web, 8(3):437–452, 2017.
- Wikidata: a free collaborative knowledgebase. Communications of the ACM, 57(10):78–85, 2014.
- Knowledge graph embedding: A survey of approaches and applications. IEEE Transactions on Knowledge and Data Engineering, 29(12):2724–2743, 2017.
- Mrp2rec: Exploring multiple-step relation path semantics for knowledge graph-based recommendations. IEEE Access, 8:134817–134825, 2020.
- Learning intents behind interactions with knowledge graph for recommendation. In Proceedings of the web conference 2021, pages 878–887, 2021.
- More is better: Sequential combinations of knowledge graph embedding approaches. In Semantic Technology: 8th Joint International Conference, JIST 2018, Awaji, Japan, November 26–28, 2018, Proceedings 8, pages 19–35. Springer, 2018.
- The fair guiding principles for scientific data management and stewardship. Scientific data, 3(1):1–9, 2016.
- Knowlab at radsum23: comparing pre-trained language models in radiology report summarization. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, pages 535–540. ACL, 2023.
- Improving neural fine-grained entity typing with knowledge attention. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
- Multi-level representations for fine-grained typing of knowledge base entities. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pages 578–589, 2017.
- Li Yujian and Liu Bo. A normalized levenshtein distance metric. IEEE transactions on pattern analysis and machine intelligence, 29(6):1091–1095, 2007.
- The ten-year ontofarm and its fertilization within the onto-sphere. Journal of Web Semantics, 43:46–53, 2017.
- Few-shot knowledge graph completion. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 3041–3048, 2020.
- Knowledge base completion by learning pairwise-interaction differentiated embeddings. Data mining and knowledge discovery, 29:1486–1504, 2015.
- Connecting embeddings for knowledge graph entity typing. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6419–6428, 2020.
- Iterative entity alignment via joint knowledge embeddings. In IJCAI, volume 17, pages 4258–4264, 2017.
- Knowledge graph entity type prediction with relational aggregation graph attention network. In European Semantic Web Conference, pages 39–55. Springer, 2022.