Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Empowering Small-Scale Knowledge Graphs: A Strategy of Leveraging General-Purpose Knowledge Graphs for Enriched Embeddings (2405.10745v1)

Published 17 May 2024 in cs.LG, cs.AI, and cs.CL

Abstract: Knowledge-intensive tasks pose a significant challenge for Machine Learning (ML) techniques. Commonly adopted methods, such as LLMs, often exhibit limitations when applied to such tasks. Nevertheless, there have been notable endeavours to mitigate these challenges, with a significant emphasis on augmenting LLMs through Knowledge Graphs (KGs). While KGs provide many advantages for representing knowledge, their development costs can deter extensive research and applications. Addressing this limitation, we introduce a framework for enriching embeddings of small-scale domain-specific Knowledge Graphs with well-established general-purpose KGs. Adopting our method, a modest domain-specific KG can benefit from a performance boost in downstream tasks when linked to a substantial general-purpose KG. Experimental evaluations demonstrate a notable enhancement, with up to a 44% increase observed in the Hits@10 metric. This relatively unexplored research direction can catalyze more frequent incorporation of KGs in knowledge-intensive tasks, resulting in more robust, reliable ML implementations, which hallucinates less than prevalent LLM solutions. Keywords: knowledge graph, knowledge graph completion, entity alignment, representation learning, machine learning

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. BoxE: A box embedding model for knowledge base completion. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20, pages 9649–9661, Red Hook, NY, USA. Curran Associates Inc.
  2. Evaluating correctness and faithfulness of instruction-following models for question answering.
  3. Bringing light into the dark: A large-scale evaluation of knowledge graph embedding models under a unified framework. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(12):8825–8845.
  4. PyKEEN 1.0: A python library for training and evaluating knowledge graph embeddings. Journal of Machine Learning Research, 22(82):1–6.
  5. Tucker: Tensor factorization for knowledge graph completion. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics.
  6. Representation learning: A review and new perspectives. In IEEE transactions on pattern analysis and machine intelligence, volume 35, pages 1798–1828.
  7. Tim Berners-Lee et al. 1998. Semantic web road map.
  8. Lukas Biewald. 2020. Experiment tracking with weights and biases. Software available from wandb.com.
  9. Translating Embeddings for Modeling Multi-relational Data. In Advances in Neural Information Processing Systems, volume 26. Curran Associates, Inc.
  10. Samuel R. Bowman. 2023. Eight things to know about large language models.
  11. Language models are few-shot learners.
  12. Multilingual knowledge graph embeddings for cross-lingual knowledge alignment. In Proceedings of the 26th international joint conference on artificial intelligence, IJCAI’17, pages 1511–1517. AAAI Press.
  13. Combining pre-trained language models and structured knowledge.
  14. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
  15. Neural message passing for quantum chemistry. In Proceedings of the 34th international conference on machine learning - volume 70, ICML’17, pages 1263–1272, Sydney, NSW, Australia. JMLR.org. Number of pages: 10.
  16. Thomas R Gruber. 1995. Toward principles for the design of ontologies used for knowledge sharing? International journal of human-computer studies, 43(5-6):907–928.
  17. Representation learning on graphs: Methods and applications. In IEEE Data Eng. Bull., volume 40, pages 52–74.
  18. A Survey on Knowledge Graphs: Representation, Acquisition, and Applications. In IEEE Transactions on Neural Networks and Learning Systems, volume 33, pages 494–514. Conference Name: IEEE Transactions on Neural Networks and Learning Systems.
  19. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, 7(3):535–547.
  20. Bag of tricks for efficient text classification.
  21. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
  22. Thomas N. Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In International conference on learning representations (ICLR).
  23. Dvc: Data version control - git for data & models.
  24. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. In Advances in Neural Information Processing Systems, volume 33, pages 9459–9474. Curran Associates, Inc.
  25. Selfkg: Self-supervised entity alignment in knowledge graphs. CoRR, abs/2203.01044.
  26. Learning Attention-based Embeddings for Relation Prediction in Knowledge Graphs. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4710–4723, Florence, Italy. Association for Computational Linguistics.
  27. PyTorch: An imperative style, high-performance deep learning library. In Advances in neural information processing systems, volume 32. Curran Associates, Inc.
  28. KILT: a Benchmark for Knowledge Intensive Language Tasks. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2523–2544, Online. Association for Computational Linguistics.
  29. Language models as knowledge bases?
  30. Sparsity and Noise: Where Knowledge Graph Embeddings Fall Short. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1751–1756, Copenhagen, Denmark. Association for Computational Linguistics.
  31. Modeling Relational Data with Graph Convolutional Networks. In The Semantic Web, Lecture Notes in Computer Science, pages 593–607, Cham. Springer International Publishing.
  32. Juan Sequeda and Ora Lassila. 2021. Designing and Building Enterprise Knowledge Graphs. Springer International Publishing.
  33. Cross-lingual entity alignment via joint attribute-preserving embedding. In ISWC, pages 628–644.
  34. RotatE: Knowledge graph embedding by relational rotation in complex space. In International conference on learning representations.
  35. Llama: Open and efficient foundation language models.
  36. Entity Alignment between Knowledge Graphs Using Attribute Embeddings. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01):297–304.
  37. Knowledge graph completion via complex tensor factorization.
  38. Composition-based Multi-Relational Graph Convolutional Networks. In International Conference on Learning Representations.
  39. A comprehensive survey of entity alignment for knowledge graphs. AI Open, 2:1–13.
  40. A benchmark and comprehensive survey on knowledge graph entity alignment via representation learning.
  41. Quaternion knowledge graph embeddings. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, 246, pages 2735–2745. Curran Associates Inc., Red Hook, NY, USA.
  42. Convolutional 2D knowledge graph embeddings. AAAI Press, AAAI’18/IAAI’18/EAAI’18.
  43. Message Passing for Hyper-Relational Knowledge Graphs. Association for Computational Linguistics.
  44. YAGO3: A Knowledge Base from Multilingual Wikipedias. www.cidrdb.org.
  45. Miller, George A. 1995. WordNet: a lexical database for English. ACM New York, NY, USA.
  46. ConceptNet 5.5: An open multilingual graph of general knowledge. AAAI Press, AAAI’17. Place: San Francisco, California, USA Number of pages: 8.
  47. Observed versus latent features for knowledge base and text inference. Association for Computational Linguistics.
  48. Wikidata: a free collaborative knowledgebase. ACM New York, NY, USA.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Albert Sawczyn (6 papers)
  2. Jakub Binkowski (6 papers)
  3. Piotr Bielak (10 papers)
  4. Tomasz Kajdanowicz (44 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com