Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 100 tok/s
Gemini 2.5 Pro 51 tok/s Pro
GPT-5 Medium 26 tok/s Pro
GPT-5 High 33 tok/s Pro
GPT-4o 103 tok/s Pro
Kimi K2 200 tok/s Pro
GPT OSS 120B 447 tok/s Pro
Claude Sonnet 4 36 tok/s Pro
2000 character limit reached

Graphusion: A RAG Framework for Knowledge Graph Construction with a Global Perspective (2410.17600v2)

Published 23 Oct 2024 in cs.CL, cs.AI, and cs.DB

Abstract: Knowledge Graphs (KGs) are crucial in the field of artificial intelligence and are widely used in downstream tasks, such as question-answering (QA). The construction of KGs typically requires significant effort from domain experts. LLMs have recently been used for Knowledge Graph Construction (KGC). However, most existing approaches focus on a local perspective, extracting knowledge triplets from individual sentences or documents, missing a fusion process to combine the knowledge in a global KG. This work introduces Graphusion, a zero-shot KGC framework from free text. It contains three steps: in Step 1, we extract a list of seed entities using topic modeling to guide the final KG includes the most relevant entities; in Step 2, we conduct candidate triplet extraction using LLMs; in Step 3, we design the novel fusion module that provides a global view of the extracted knowledge, incorporating entity merging, conflict resolution, and novel triplet discovery. Results show that Graphusion achieves scores of 2.92 and 2.37 out of 3 for entity extraction and relation recognition, respectively. Moreover, we showcase how Graphusion could be applied to the NLP domain and validate it in an educational scenario. Specifically, we introduce TutorQA, a new expert-verified benchmark for QA, comprising six tasks and a total of 1,200 QA pairs. Using the Graphusion-constructed KG, we achieve a significant improvement on the benchmark, for example, a 9.2% accuracy improvement on sub-graph completion.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023).
  2. PubGraph: A Large-Scale Scientific Knowledge Graph. arXiv preprint arXiv:2302.02231 (2023).
  3. Unsupervised Statistical Machine Translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium.
  4. Knowledge-Augmented Language Model Prompting for Zero-Shot Knowledge Graph Question Answering. ArXiv abs/2306.04136 (2023). https://api.semanticscholar.org/CorpusID:259095910
  5. COMET: Commonsense Transformers for Automatic Knowledge Graph Construction. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Anna Korhonen, David Traum, and Lluís Màrquez (Eds.). Association for Computational Linguistics, Florence, Italy, 4762–4779. https://doi.org/10.18653/v1/P19-1470
  6. Iterative Zero-Shot LLM Prompting for Knowledge Graph Construction. ArXiv abs/2307.01128 (2023). https://api.semanticscholar.org/CorpusID:259316469
  7. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In North American Chapter of the Association for Computational Linguistics. https://api.semanticscholar.org/CorpusID:52967399
  8. From Local to Global: A Graph RAG Approach to Query-Focused Summarization. ArXiv abs/2404.16130 (2024). https://api.semanticscholar.org/CorpusID:269363075
  9. Evaluating large language models on wikipedia-style survey generation. In Findings of the Association for Computational Linguistics ACL 2024. 5405–5418.
  10. Automatic Grading Tool for Jupyter Notebooks in Artificial Intelligence Courses. Sustainability (2021). https://api.semanticscholar.org/CorpusID:243477284
  11. Maarten R. Grootendorst. 2022. BERTopic: Neural topic modeling with a class-based TF-IDF procedure. ArXiv abs/2203.05794 (2022). https://api.semanticscholar.org/CorpusID:247411231
  12. Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 855–864.
  13. ChaTA: Towards an Intelligent Question-Answer Teaching Assistant using Open-Source LLMs. ArXiv abs/2311.02775 (2023). https://api.semanticscholar.org/CorpusID:265033489
  14. The Goldilocks Principle: Reading Children’s Books with Explicit Memory Representations. CoRR abs/1511.02301 (2015). https://api.semanticscholar.org/CorpusID:14915449
  15. GraphCare: Enhancing Healthcare Predictions with Personalized Knowledge Graphs. In The Twelfth International Conference on Learning Representations.
  16. Scientific Knowledge Graph Creation and Analysis. 2023 IEEE 8th International Conference for Convergence in Technology (I2CT) (2023), 1–5. https://api.semanticscholar.org/CorpusID:258870236
  17. Comparing Open-Access Database and Traditional Intensive Care Studies Using Machine Learning: Bibliometric Analysis Study. Journal of Medical Internet Research 26 (2024), e48330.
  18. Towards Building Live Open Scientific Knowledge Graphs. Companion Proceedings of the Web Conference 2022 (2022). https://api.semanticscholar.org/CorpusID:248347985
  19. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. ArXiv abs/2005.11401 (2020). https://api.semanticscholar.org/CorpusID:218869575
  20. Unsupervised Cross-Domain Prerequisite Chain Learning using Variational Graph Autoencoders. In Annual Meeting of the Association for Computational Linguistics. https://api.semanticscholar.org/CorpusID:234334083
  21. Irene Li and Boming Yang. 2023. NNKGC: Improving Knowledge Graph Completion with Node Neighborhoods. In Proceedings of the Workshop on Deep Learning for Knowledge Graphs (DL4KG 2023) co-located with the 21th International Semantic Web Conference (ISWC 2023), Athens, November 6-10, 2023 (CEUR Workshop Proceedings, Vol. 3559), Mehwish Alam and Michael Cochez (Eds.). CEUR-WS.org. https://ceur-ws.org/Vol-3559/paper-6.pdf
  22. R-VGAE: Relational-variational Graph Autoencoder for Unsupervised Prerequisite Chain Learning. ArXiv abs/2004.10610 (2020). https://api.semanticscholar.org/CorpusID:216056469
  23. Efficient Variational Graph Autoencoders for Unsupervised Cross-domain Prerequisite Chains. ArXiv abs/2109.08722 (2021). https://api.semanticscholar.org/CorpusID:237571655
  24. LLM-based Multi-Level Knowledge Generation for Few-shot Knowledge Graph Completion. Proceedings of the Thirty-ThirdInternational Joint Conference on Artificial Intelligence (2024). https://api.semanticscholar.org/CorpusID:271494703
  25. G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment. In Conference on Empirical Methods in Natural Language Processing. https://api.semanticscholar.org/CorpusID:257804696
  26. Error Analysis Prompting Enables Human-Like Translation Evaluation in Large Language Models: A Case Study on ChatGPT. ArXiv abs/2303.13809 (2023). https://api.semanticscholar.org/CorpusID:257756967
  27. Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction. ArXiv abs/1808.09602 (2018). https://api.semanticscholar.org/CorpusID:52118895
  28. ExpertQA: Expert-Curated Questions and Attributed Answers. ArXiv abs/2309.07852 (2023). https://api.semanticscholar.org/CorpusID:261823130
  29. Andrew Lan Nigel Fernandez, Alexander Scarlatos. 2024. SyllabusQA: A Course Logistics Question Answering Dataset. ArXiv abs/2403.14666 (2024). https://api.semanticscholar.org/CorpusID:268667283
  30. Unifying Large Language Models and Knowledge Graphs: A Roadmap. IEEE Trans. Knowl. Data Eng. 36, 7 (2024), 3580–3599. https://doi.org/10.1109/TKDE.2024.3352100
  31. DeepWalk: Online Learning of Social Representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (New York, New York, USA) (KDD ’14). ACM, New York, NY, USA, 701–710. https://doi.org/10.1145/2623330.2623732
  32. KG-COVID-19: A Framework to Produce Customized Knowledge Graphs for COVID-19 Response. Patterns 2 (2020). https://api.semanticscholar.org/CorpusID:221191594
  33. Generating Quiz Questions from Knowledge Graphs. Proceedings of the 24th International Conference on World Wide Web (2015). https://api.semanticscholar.org/CorpusID:7522972
  34. Challenging the Assumption of Structure-based embeddings in Few- and Zero-shot Knowledge Graph Completion. In International Conference on Language Resources and Evaluation. https://api.semanticscholar.org/CorpusID:252376765
  35. Deshraj Yadav Taranjeet Singh. 2023. Embedchain: The Open Source RAG Framework. https://github.com/embedchain/embedchain.
  36. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023).
  37. Denny Vrandečić and Markus Krötzsch. 2014. Wikidata: a free collaborative knowledgebase. Commun. ACM 57, 10 (sep 2014), 78–85. https://doi.org/10.1145/2629489
  38. Chain of Thought Prompting Elicits Reasoning in Large Language Models. ArXiv abs/2201.11903 (2022). https://api.semanticscholar.org/CorpusID:246411621
  39. Fantastic Questions and Where to Find Them: FairytaleQA – An Authentic Dataset for Narrative Comprehension. In Annual Meeting of the Association for Computational Linguistics. https://api.semanticscholar.org/CorpusID:247762948
  40. KG-Rank: Enhancing Large Language Models for Medical QA with Knowledge Graphs and Ranking Techniques. In Proceedings of the 23rd Workshop on Biomedical Natural Language Processing, Dina Demner-Fushman, Sophia Ananiadou, Makoto Miwa, Kirk Roberts, and Junichi Tsujii (Eds.). Association for Computational Linguistics, Bangkok, Thailand, 155–166. https://doi.org/10.18653/v1/2024.bionlp-1.13
  41. Retrieval-Augmented Generation for Generative Artificial Intelligence in Medicine. arXiv preprint arXiv:2406.12449 (2024).
  42. Large language models in health care: Development, applications, and challenges. Health Care Science 2, 4 (2023), 255–263.
  43. Ascle—A Python Natural Language Processing Toolkit for Medical Text Generation: Development and Evaluation Study. Journal of Medical Internet Research 26 (2024), e60601.
  44. Making Large Language Models Perform Better in Knowledge Graph Completion. ArXiv abs/2310.06671 (2023). https://api.semanticscholar.org/CorpusID:263830580
  45. VISAR: A Human-AI Argumentative Writing Assistant with Visual Programming and Rapid Draft Prototyping. Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (2023). https://api.semanticscholar.org/CorpusID:258179241
  46. LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities. ArXiv abs/2305.13168 (2023). https://api.semanticscholar.org/CorpusID:258833039
  47. Exploring Automated Question Answering Methods for Teaching Assistance. Artificial Intelligence in Education 12163 (2020), 610 – 622. https://api.semanticscholar.org/CorpusID:220364751
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com