Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Automated Construction of Theme-specific Knowledge Graphs (2404.19146v1)

Published 29 Apr 2024 in cs.AI and cs.IR

Abstract: Despite widespread applications of knowledge graphs (KGs) in various tasks such as question answering and intelligent conversational systems, existing KGs face two major challenges: information granularity and deficiency in timeliness. These hinder considerably the retrieval and analysis of in-context, fine-grained, and up-to-date knowledge from KGs, particularly in highly specialized themes (e.g., specialized scientific research) and rapidly evolving contexts (e.g., breaking news or disaster tracking). To tackle such challenges, we propose a theme-specific knowledge graph (i.e., ThemeKG), a KG constructed from a theme-specific corpus, and design an unsupervised framework for ThemeKG construction (named TKGCon). The framework takes raw theme-specific corpus and generates a high-quality KG that includes salient entities and relations under the theme. Specifically, we start with an entity ontology of the theme from Wikipedia, based on which we then generate candidate relations by LLMs to construct a relation ontology. To parse the documents from the theme corpus, we first map the extracted entity pairs to the ontology and retrieve the candidate relations. Finally, we incorporate the context and ontology to consolidate the relations for entity pairs. We observe that directly prompting GPT-4 for theme-specific KG leads to inaccurate entities (such as "two main types" as one entity in the query result) and unclear (such as "is", "has") or wrong relations (such as "have due to", "to start"). In contrast, by constructing the theme-specific KG step by step, our model outperforms GPT-4 and could consistently identify accurate entities and relations. Experimental results also show that our framework excels in evaluations compared with various KG construction baselines.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (61)
  1. Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
  2. Semi-supervised bootstrapping of relationship extractors with distributional semantics. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 499–504, Lisbon, Portugal. Association for Computational Linguistics.
  3. Carb: A crowdsourced benchmark for open ie. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 6262–6267.
  4. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  5. Graphene: Semantically-linked propositions in open information extraction. In Proceedings of the 27th International Conference on Computational Linguistics, pages 2300–2311.
  6. Knowledge graph completion models are few-shot learners: An empirical study of relation labeling in e-commerce with llms. arXiv preprint arXiv:2305.09858.
  7. Relation extraction as open-book examination: Retrieval-enhanced prompt tuning. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2443–2448.
  8. Knowprompt: Knowledge-aware prompt-tuning with synergistic optimization for relation extraction. In Proceedings of the ACM Web conference 2022, pages 2778–2788.
  9. Convolutional 2d knowledge graph embeddings. In Proceedings of the AAAI conference on artificial intelligence, volume 32.
  10. Docoie: A document-level context-aware dataset for openie. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 2377–2389.
  11. Zhiyuan Fan and Shizhu He. 2023. Efficient data learning for open information extraction with pre-trained language models. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 13056–13063.
  12. Moltc: Towards molecular relational modeling in language models.
  13. Zoie: A zero-shot open information extraction model based on language model. In 2023 26th International Conference on Computer Supported Cooperative Work in Design (CSCWD), pages 784–789. IEEE.
  14. Computing semantic relatedness using wikipedia-based explicit semantic analysis. In IJCAI, volume 7, pages 1606–1611.
  15. Making large language models better knowledge miners for online marketing with progressive prompting augmentation. arXiv preprint arXiv:2312.05276.
  16. Synergistic anchored contrastive pre-training for few-shot relation extraction. arXiv preprint arXiv:2312.12021.
  17. Medml: fusing medical knowledge and machine learning models for early pediatric covid-19 hospitalization and severity prediction. Iscience, 25(9).
  18. Minie: Minimizing facts in open information extraction. In Conference on Empirical Methods in Natural Language Processing, pages 2620–2630. ACL.
  19. Comparison of pipeline, sequence-to-sequence, and gpt models for end-to-end relation extraction: experiments with the rare disease use-case. arXiv preprint arXiv:2311.13729.
  20. Ptr: Prompt tuning with rules for text classification. arXiv preprint arXiv:2105.11259.
  21. Matthew Honnibal and Ines Montani. 2017. spacy 2: Natural language understanding with bloom embeddings, convolutional neural networks and incremental parsing. To appear, 7(1):411–420.
  22. Geospatial topological relation extraction from text with knowledge augmentation. In Proceedings of the 2024 SIAM International Conference on Data Mining (SDM).
  23. Improving sequential recommendation with knowledge-enhanced memory networks. In The 41st international ACM SIGIR conference on research & development in information retrieval, pages 505–514.
  24. Pere-Lluís Huguet Cabot and Roberto Navigli. 2021. REBEL: Relation extraction by end-to-end language generation. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 2370–2381, Punta Cana, Dominican Republic. Association for Computational Linguistics.
  25. Revisiting document-level relation extraction with context-guided link prediction. arXiv preprint arXiv:2401.11800.
  26. A survey on knowledge graphs: Representation, acquisition and application. IEEE transactions on neural networks and learning systems, 33(2).
  27. Text augmented open knowledge graph completion via pre-trained language models. In Findings of the Association for Computational Linguistics: ACL 2023, pages 11161–11180, Toronto, Canada. Association for Computational Linguistics.
  28. Open-vocabulary argument role prediction for event extraction. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.
  29. Large language models struggle to learn long-tail knowledge. In International Conference on Machine Learning, pages 15696–15707. PMLR.
  30. From large language models to knowledge graphs for biomarker discovery in cancer. arXiv preprint arXiv:2310.08365.
  31. Imojie: Iterative memory-based joint open information extraction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5871–5886.
  32. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880.
  33. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474.
  34. Contextualization distillation from large language model for knowledge graph completion. arXiv preprint arXiv:2402.01729.
  35. Semi-automatic data enhancement for document-level relation extraction with distant supervision from large language models. arXiv preprint arXiv:2311.07314.
  36. Mausam Mausam. 2016. Open information extraction systems and downstream applications. In Proceedings of the twenty-fifth international joint conference on artificial intelligence, pages 4074–4077.
  37. Rapl: A relation-aware prototype learning approach for few-shot document-level relation extraction. arXiv preprint arXiv:2310.15743.
  38. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
  39. Abstractive open information extraction. In The 2023 Conference on Empirical Methods in Natural Language Processing.
  40. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers).
  41. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 2227–2237, New Orleans, Louisiana. Association for Computational Linguistics.
  42. Weakly-supervised relation extraction by pattern-enhanced embedding learning. In Proceedings of the 2018 World Wide Web Conference, pages 1257–1266.
  43. Automated phrase mining from massive text corpora. IEEE Transactions on Knowledge and Data Engineering, 30(10):1825–1837.
  44. PullNet: Open domain question answering with iterative retrieval on knowledge bases and text. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2380–2390, Hong Kong, China. Association for Computational Linguistics.
  45. Consistency guided knowledge retrieval and denoising in llms for zero-shot document-level relation triplet extraction. arXiv preprint arXiv:2401.13598.
  46. Ammar Tahir. 2023. Knowledge graph gpt. https://github.com/iAmmarTahir/KnowledgeGraphGPT.
  47. Enhancing knowledge graph construction using large language models. arXiv preprint arXiv:2305.04676.
  48. An autoregressive text-to-graph framework for joint entity and relation extraction. arXiv preprint arXiv:2401.01326.
  49. Attention is all you need. Advances in neural information processing systems, 30.
  50. Revisiting relation extraction in the era of large language models. arXiv preprint arXiv:2305.05003.
  51. GPT-RE: In-context learning for relation extraction using large language models. In The 2023 Conference on Empirical Methods in Natural Language Processing.
  52. Improving unsupervised relation extraction by augmenting diverse sentence pairs. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 12136–12147, Singapore. Association for Computational Linguistics.
  53. Guide the many-to-one assignment: Open information extraction via iou-aware optimal transport. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4971–4984.
  54. Kicgpt: Large language model with knowledge in context for knowledge graph completion. arXiv preprint arXiv:2402.02389.
  55. Indirect supervision for relation extraction using question-answer pairs. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pages 646–654.
  56. Joint entity and relation extraction with span pruning and hypergraph neural networks. arXiv preprint arXiv:2310.17238.
  57. Schema-aware reference as prompt improves data-efficient relational triple and event extraction. arXiv preprint arXiv:2210.10709.
  58. Schema-adaptable knowledge graph construction. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing.
  59. Beyond isolation: Multi-agent synergy for improving knowledge graph construction. arXiv preprint arXiv:2312.03022.
  60. Corpus-based relation extraction by identifying and refining relation patterns. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 20–38. Springer.
  61. Llms for knowledge graph construction and reasoning: Recent capabilities and future opportunities. arXiv preprint arXiv:2305.13168.
Citations (6)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com