Graphusion: A RAG Framework for Knowledge Graph Construction with a Global Perspective (2410.17600v2)
Abstract: Knowledge Graphs (KGs) are crucial in the field of artificial intelligence and are widely used in downstream tasks, such as question-answering (QA). The construction of KGs typically requires significant effort from domain experts. LLMs have recently been used for Knowledge Graph Construction (KGC). However, most existing approaches focus on a local perspective, extracting knowledge triplets from individual sentences or documents, missing a fusion process to combine the knowledge in a global KG. This work introduces Graphusion, a zero-shot KGC framework from free text. It contains three steps: in Step 1, we extract a list of seed entities using topic modeling to guide the final KG includes the most relevant entities; in Step 2, we conduct candidate triplet extraction using LLMs; in Step 3, we design the novel fusion module that provides a global view of the extracted knowledge, incorporating entity merging, conflict resolution, and novel triplet discovery. Results show that Graphusion achieves scores of 2.92 and 2.37 out of 3 for entity extraction and relation recognition, respectively. Moreover, we showcase how Graphusion could be applied to the NLP domain and validate it in an educational scenario. Specifically, we introduce TutorQA, a new expert-verified benchmark for QA, comprising six tasks and a total of 1,200 QA pairs. Using the Graphusion-constructed KG, we achieve a significant improvement on the benchmark, for example, a 9.2% accuracy improvement on sub-graph completion.
- Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023).
- PubGraph: A Large-Scale Scientific Knowledge Graph. arXiv preprint arXiv:2302.02231 (2023).
- Unsupervised Statistical Machine Translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium.
- Knowledge-Augmented Language Model Prompting for Zero-Shot Knowledge Graph Question Answering. ArXiv abs/2306.04136 (2023). https://api.semanticscholar.org/CorpusID:259095910
- COMET: Commonsense Transformers for Automatic Knowledge Graph Construction. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Anna Korhonen, David Traum, and Lluís Màrquez (Eds.). Association for Computational Linguistics, Florence, Italy, 4762–4779. https://doi.org/10.18653/v1/P19-1470
- Iterative Zero-Shot LLM Prompting for Knowledge Graph Construction. ArXiv abs/2307.01128 (2023). https://api.semanticscholar.org/CorpusID:259316469
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In North American Chapter of the Association for Computational Linguistics. https://api.semanticscholar.org/CorpusID:52967399
- From Local to Global: A Graph RAG Approach to Query-Focused Summarization. ArXiv abs/2404.16130 (2024). https://api.semanticscholar.org/CorpusID:269363075
- Evaluating large language models on wikipedia-style survey generation. In Findings of the Association for Computational Linguistics ACL 2024. 5405–5418.
- Automatic Grading Tool for Jupyter Notebooks in Artificial Intelligence Courses. Sustainability (2021). https://api.semanticscholar.org/CorpusID:243477284
- Maarten R. Grootendorst. 2022. BERTopic: Neural topic modeling with a class-based TF-IDF procedure. ArXiv abs/2203.05794 (2022). https://api.semanticscholar.org/CorpusID:247411231
- Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 855–864.
- ChaTA: Towards an Intelligent Question-Answer Teaching Assistant using Open-Source LLMs. ArXiv abs/2311.02775 (2023). https://api.semanticscholar.org/CorpusID:265033489
- The Goldilocks Principle: Reading Children’s Books with Explicit Memory Representations. CoRR abs/1511.02301 (2015). https://api.semanticscholar.org/CorpusID:14915449
- GraphCare: Enhancing Healthcare Predictions with Personalized Knowledge Graphs. In The Twelfth International Conference on Learning Representations.
- Scientific Knowledge Graph Creation and Analysis. 2023 IEEE 8th International Conference for Convergence in Technology (I2CT) (2023), 1–5. https://api.semanticscholar.org/CorpusID:258870236
- Comparing Open-Access Database and Traditional Intensive Care Studies Using Machine Learning: Bibliometric Analysis Study. Journal of Medical Internet Research 26 (2024), e48330.
- Towards Building Live Open Scientific Knowledge Graphs. Companion Proceedings of the Web Conference 2022 (2022). https://api.semanticscholar.org/CorpusID:248347985
- Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. ArXiv abs/2005.11401 (2020). https://api.semanticscholar.org/CorpusID:218869575
- Unsupervised Cross-Domain Prerequisite Chain Learning using Variational Graph Autoencoders. In Annual Meeting of the Association for Computational Linguistics. https://api.semanticscholar.org/CorpusID:234334083
- Irene Li and Boming Yang. 2023. NNKGC: Improving Knowledge Graph Completion with Node Neighborhoods. In Proceedings of the Workshop on Deep Learning for Knowledge Graphs (DL4KG 2023) co-located with the 21th International Semantic Web Conference (ISWC 2023), Athens, November 6-10, 2023 (CEUR Workshop Proceedings, Vol. 3559), Mehwish Alam and Michael Cochez (Eds.). CEUR-WS.org. https://ceur-ws.org/Vol-3559/paper-6.pdf
- R-VGAE: Relational-variational Graph Autoencoder for Unsupervised Prerequisite Chain Learning. ArXiv abs/2004.10610 (2020). https://api.semanticscholar.org/CorpusID:216056469
- Efficient Variational Graph Autoencoders for Unsupervised Cross-domain Prerequisite Chains. ArXiv abs/2109.08722 (2021). https://api.semanticscholar.org/CorpusID:237571655
- LLM-based Multi-Level Knowledge Generation for Few-shot Knowledge Graph Completion. Proceedings of the Thirty-ThirdInternational Joint Conference on Artificial Intelligence (2024). https://api.semanticscholar.org/CorpusID:271494703
- G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment. In Conference on Empirical Methods in Natural Language Processing. https://api.semanticscholar.org/CorpusID:257804696
- Error Analysis Prompting Enables Human-Like Translation Evaluation in Large Language Models: A Case Study on ChatGPT. ArXiv abs/2303.13809 (2023). https://api.semanticscholar.org/CorpusID:257756967
- Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction. ArXiv abs/1808.09602 (2018). https://api.semanticscholar.org/CorpusID:52118895
- ExpertQA: Expert-Curated Questions and Attributed Answers. ArXiv abs/2309.07852 (2023). https://api.semanticscholar.org/CorpusID:261823130
- Andrew Lan Nigel Fernandez, Alexander Scarlatos. 2024. SyllabusQA: A Course Logistics Question Answering Dataset. ArXiv abs/2403.14666 (2024). https://api.semanticscholar.org/CorpusID:268667283
- Unifying Large Language Models and Knowledge Graphs: A Roadmap. IEEE Trans. Knowl. Data Eng. 36, 7 (2024), 3580–3599. https://doi.org/10.1109/TKDE.2024.3352100
- DeepWalk: Online Learning of Social Representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (New York, New York, USA) (KDD ’14). ACM, New York, NY, USA, 701–710. https://doi.org/10.1145/2623330.2623732
- KG-COVID-19: A Framework to Produce Customized Knowledge Graphs for COVID-19 Response. Patterns 2 (2020). https://api.semanticscholar.org/CorpusID:221191594
- Generating Quiz Questions from Knowledge Graphs. Proceedings of the 24th International Conference on World Wide Web (2015). https://api.semanticscholar.org/CorpusID:7522972
- Challenging the Assumption of Structure-based embeddings in Few- and Zero-shot Knowledge Graph Completion. In International Conference on Language Resources and Evaluation. https://api.semanticscholar.org/CorpusID:252376765
- Deshraj Yadav Taranjeet Singh. 2023. Embedchain: The Open Source RAG Framework. https://github.com/embedchain/embedchain.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023).
- Denny Vrandečić and Markus Krötzsch. 2014. Wikidata: a free collaborative knowledgebase. Commun. ACM 57, 10 (sep 2014), 78–85. https://doi.org/10.1145/2629489
- Chain of Thought Prompting Elicits Reasoning in Large Language Models. ArXiv abs/2201.11903 (2022). https://api.semanticscholar.org/CorpusID:246411621
- Fantastic Questions and Where to Find Them: FairytaleQA – An Authentic Dataset for Narrative Comprehension. In Annual Meeting of the Association for Computational Linguistics. https://api.semanticscholar.org/CorpusID:247762948
- KG-Rank: Enhancing Large Language Models for Medical QA with Knowledge Graphs and Ranking Techniques. In Proceedings of the 23rd Workshop on Biomedical Natural Language Processing, Dina Demner-Fushman, Sophia Ananiadou, Makoto Miwa, Kirk Roberts, and Junichi Tsujii (Eds.). Association for Computational Linguistics, Bangkok, Thailand, 155–166. https://doi.org/10.18653/v1/2024.bionlp-1.13
- Retrieval-Augmented Generation for Generative Artificial Intelligence in Medicine. arXiv preprint arXiv:2406.12449 (2024).
- Large language models in health care: Development, applications, and challenges. Health Care Science 2, 4 (2023), 255–263.
- Ascle—A Python Natural Language Processing Toolkit for Medical Text Generation: Development and Evaluation Study. Journal of Medical Internet Research 26 (2024), e60601.
- Making Large Language Models Perform Better in Knowledge Graph Completion. ArXiv abs/2310.06671 (2023). https://api.semanticscholar.org/CorpusID:263830580
- VISAR: A Human-AI Argumentative Writing Assistant with Visual Programming and Rapid Draft Prototyping. Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (2023). https://api.semanticscholar.org/CorpusID:258179241
- LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities. ArXiv abs/2305.13168 (2023). https://api.semanticscholar.org/CorpusID:258833039
- Exploring Automated Question Answering Methods for Teaching Assistance. Artificial Intelligence in Education 12163 (2020), 610 – 622. https://api.semanticscholar.org/CorpusID:220364751
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.