Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Contextualized Structural Self-supervised Learning for Ontology Matching (2310.03840v1)

Published 5 Oct 2023 in cs.LG, cs.AI, and cs.CL

Abstract: Ontology matching (OM) entails the identification of semantic relationships between concepts within two or more knowledge graphs (KGs) and serves as a critical step in integrating KGs from various sources. Recent advancements in deep OM models have harnessed the power of transformer-based LLMs and the advantages of knowledge graph embedding. Nevertheless, these OM models still face persistent challenges, such as a lack of reference alignments, runtime latency, and unexplored different graph structures within an end-to-end framework. In this study, we introduce a novel self-supervised learning OM framework with input ontologies, called LaKERMap. This framework capitalizes on the contextual and structural information of concepts by integrating implicit knowledge into transformers. Specifically, we aim to capture multiple structural contexts, encompassing both local and global interactions, by employing distinct training objectives. To assess our methods, we utilize the Bio-ML datasets and tasks. The findings from our innovative approach reveal that LaKERMap surpasses state-of-the-art systems in terms of alignment quality and inference time. Our models and codes are available here: https://github.com/ellenzhuwang/lakermap.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. I. F. Cruz, H. Xiao, The Role of Ontologies in Data Integration, Journal of Engineering Intelligent Systems 13 (2005) 245–252.
  2. The AgreementMakerLight Ontology Matching System, in: International Conference on Ontologies, DataBases, and Applications of Semantics (ODBASE), Springer, 2013, pp. 527–541.
  3. E. Jiménez-Ruiz, B. Cuenca Grau, LogMap: Logic-based and Scalable Ontology Matching (2011) 273–288.
  4. Bert: Pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805 (2018).
  5. Publicly available clinical BERT embeddings, in: Proceedings of the 2nd Clinical Natural Language Processing Workshop, Association for Computational Linguistics, Minneapolis, Minnesota, USA, 2019, pp. 72–78. URL: https://www.aclweb.org/anthology/W19-1909. doi:10.18653/v1/W19-1909.
  6. Matching with transformers in melt, arXiv preprint arXiv:2109.07401 (2021).
  7. Bertmap: A bert-based ontology alignment system, arXiv preprint arXiv:2112.02682 (2021).
  8. Multi-view embedding for biomedical ontology matching., OM@ISWC 2536 (2019) 13–24.
  9. Z. Wang, Amd results for oaei 2022 (2022).
  10. Translating Embeddings for Modeling Multi-relational Data, Advances in Neural Information Processing Systems 26 (2013) 2787–2795.
  11. Learning entity and relation embeddings for knowledge graph completion, in: AAAI Conference on Artificial Intelligence, 2015.
  12. Machine learning-friendly biomedical datasets for equivalence and subsumption ontology matching, in: The Semantic Web–ISWC 2022: 21st International Semantic Web Conference, Virtual Event, October 23–27, 2022, Proceedings, Springer, 2022, pp. 575–591.
  13. Attention is all you need, in: Advances in neural information processing systems, 2017, pp. 5998–6008.
  14. Contrastive language-image pre-training with knowledge graphs, Advances in Neural Information Processing Systems 35 (2022) 22895–22910.
  15. S. Hertling, H. Paulheim, Atbox results for oaei 2022 (2022).
  16. E. Jiménez-Ruiz, Logmap family participation in the oaei 2022 (2022).
  17. Matcha and matcha-dl results for oaei 2022 (2022).
  18. AgreementMaker: Efficient Matching for Large Real-World Schemas and Ontologies, PVLDB 2 (2009) 1586–1589.
  19. Deepalignment: Unsupervised ontology matching with refined word vectors, in: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), 2018a, pp. 787–798.
  20. Biomedical ontology alignment: an approach based on representation learning, Journal of biomedical semantics 9 (2018b) 1–20.
  21. Efficient estimation of word representations in vector space, arXiv preprint arXiv:1301.3781 (2013).
  22. J. Portisch, H. Paulheim, Alod2vec matcher., OM@ ISWC 2288 (2018) 132–137.
  23. P. Ristoski, H. Paulheim, Rdf2vec: Rdf graph embeddings for data mining, in: International Semantic Web Conference, Springer, 2016, pp. 498–514.
  24. Veealign: a supervised deep learning approach to ontology alignment., in: OM@ ISWC, 2020, pp. 216–224.
  25. Augmenting ontology alignment by semantic embedding and distant supervision, in: The Semantic Web: 18th International Conference, ESWC 2021, Virtual Event, June 6–10, 2021, Proceedings 18, Springer, 2021, pp. 392–408.
  26. Z. Wang, I. F. Cruz, Agreementmakerdeep results for OAEI 2021, in: Proceedings of the 16th International Workshop on Ontology Matching co-located with the 20th International Semantic Web Conference (ISWC 2021), Virtual conference, October 25, 2021, volume 3063 of CEUR Workshop Proceedings, CEUR-WS.org, 2021, pp. 124–130.
  27. S. Neutel, M. H. de Boer, Towards automatic ontology alignment using bert., in: AAAI Spring Symposium: Combining Machine Learning with Knowledge Engineering, 2021.
  28. N. Reimers, I. Gurevych, Sentence-bert: Sentence embeddings using siamese bert-networks, arXiv preprint arXiv:1908.10084 (2019).
  29. Llama: Open and efficient foundation language models, arXiv preprint arXiv:2302.13971 (2023).
  30. BioPortal: Ontologies and Integrated Data Resources at the Click of a Mouse, Nucleic Acids Research 37 (2009) W170–W173.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (1)
  1. Zhu Wang (72 papers)
Citations (2)