Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Think-on-Graph: Deep and Responsible Reasoning of Large Language Model on Knowledge Graph (2307.07697v6)

Published 15 Jul 2023 in cs.CL
Think-on-Graph: Deep and Responsible Reasoning of Large Language Model on Knowledge Graph

Abstract: Although LLMs have achieved significant success in various tasks, they often struggle with hallucination problems, especially in scenarios requiring deep and responsible reasoning. These issues could be partially addressed by introducing external knowledge graphs (KG) in LLM reasoning. In this paper, we propose a new LLM-KG integrating paradigm ``$\hbox{LLM}\otimes\hbox{KG}$'' which treats the LLM as an agent to interactively explore related entities and relations on KGs and perform reasoning based on the retrieved knowledge. We further implement this paradigm by introducing a new approach called Think-on-Graph (ToG), in which the LLM agent iteratively executes beam search on KG, discovers the most promising reasoning paths, and returns the most likely reasoning results. We use a number of well-designed experiments to examine and illustrate the following advantages of ToG: 1) compared with LLMs, ToG has better deep reasoning power; 2) ToG has the ability of knowledge traceability and knowledge correctability by leveraging LLMs reasoning and expert feedback; 3) ToG provides a flexible plug-and-play framework for different LLMs, KGs and prompting strategies without any additional training cost; 4) the performance of ToG with small LLM models could exceed large LLM such as GPT-4 in certain scenarios and this reduces the cost of LLM deployment and application. As a training-free method with lower computational cost and better generality, ToG achieves overall SOTA in 6 out of 9 datasets where most previous SOTAs rely on additional training.

Introduction

LLMs are quickly becoming adept at generating coherent and context-appropriate responses across a variety of tasks. Despite their advances, challenges emerge when LLMs are tasked with complex reasoning that demands a deep understanding of factual knowledge. LLMs may generate inaccurate or outdated responses or engage in "hallucination" – providing seemingly confident answers that are unrelated to reality. Moreover, training LLMs can be resource-intensive and does not ensure current knowledge due to the static nature of training datasets.

To enhance the reasoning capabilities of LLMs and keep their knowledge current, integration with external knowledge repositories like knowledge graphs (KGs) presents a promising solution. Prior approaches to integrating LLMs and KGs (denoted LLM⊕KG) have improved LLM performance but are limited by their loose-coupling paradigm, resulting in unexploited KG potential. To address this, the paper introduces a new model, the tight-coupling "LLM ⊗ KG" paradigm, which is realized through an approach called Think-on-Graph (ToG).

Methodology

Think-on-Graph (ToG) capitalizes on LLMs to interactively explore a KB and identify reasoning paths, without additional training. It prompts the LLM to traverse multiple potential reasoning paths on KGs, refining selections iteratively until sufficient information or a set search depth is reached. The selected paths offer a basis for the LLM to perform logical reasoning and yield answers. An extension, ToG-R (Relation-Based Think-on-Graph), focuses on relation chains instead of triples, streamlining reasoning by emphasizing the relevance of relations to the question over the entities themselves.

The process has three key phases:

  1. Initialization: Utilizing LLMs to identify initial entities related to the query.
  2. Exploration: Directing LLMs to search for and prune relevant relations and entities within the KG.
  3. Reasoning: Prompting LLMs to evaluate current reasoning paths for sufficiency in providing an answer. If inadequate, the model loops back through exploration and reasoning until sufficient information is accumulated or a set search boundary is hit.

Advantages and Experimentation

Experiments show several advantages of ToG:

  • Enhanced deep reasoning through multi-hop paths.
  • Knowledge traceability and correctability due to explicit, editable paths.
  • Flexibility in applying various LLMs and KGs.
  • Efficiency in updating LLMs with KGs, and improved generality without training costs.

In tests, even when built on smaller LLMs like LLAMA2-70B, ToG's performance matches or surpasses that of larger LLMs like GPT-4 in specific scenarios, suggesting a cost-effective alternative for deploying LLMs.

Conclusion

The Think-on-Graph approach demonstrates state-of-the-art (SOTA) performance across various datasets and tasks, showcasing its potent mix of generality, efficiency, and reasoning ability. Harnessing both structured, editable KGs and the powerful reasoning ability of LLMs, ToG advances the effectiveness of LLMs in knowledge-intensive tasks, offering a compelling solution to the problem of knowledge hallucination. Furthermore, by providing an avenue for knowledge traceability and correctability, ToG aligns with the goal of responsible AI development.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (70)
  1. Beamqa: Multi-hop knowledge graph question answering with sequence-to-sequence prediction and beam search. In Hsin-Hsi Chen, Wei-Jou (Edward) Duh, Hen-Hsen Huang, Makoto P. Kato, Josiane Mothe, and Barbara Poblete (eds.), Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2023, Taipei, Taiwan, July 23-27, 2023, pp.  781–790. ACM, 2023. doi: 10.1145/3539618.3591698. URL https://doi.org/10.1145/3539618.3591698.
  2. Direct fact retrieval from knowledge graphs without entity linking. In Anna Rogers, Jordan L. Boyd-Graber, and Naoaki Okazaki (eds.), Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, pp.  10038–10055. Association for Computational Linguistics, 2023a. doi: 10.18653/v1/2023.acl-long.558. URL https://doi.org/10.18653/v1/2023.acl-long.558.
  3. Knowledge-augmented language model prompting for zero-shot knowledge graph question answering, 2023b.
  4. Gett-qa: Graph embedding based t2t transformer for knowledge graph question answering, 2023.
  5. Semantic parsing on freebase from question-answer pairs. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, Grand Hyatt Seattle, Seattle, Washington, USA, A meeting of SIGDAT, a Special Interest Group of the ACL, pp.  1533–1544. ACL, 2013. URL https://aclanthology.org/D13-1160/.
  6. Graph of thoughts: Solving elaborate problems with large language models, 2023.
  7. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD Conference, 2008.
  8. Large-scale simple question answering with memory networks. CoRR, abs/1506.02075, 2015. URL http://arxiv.org/abs/1506.02075.
  9. Language models are few-shot learners. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (eds.), Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020a. URL https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html.
  10. Language models are few-shot learners. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (eds.), Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020b. URL https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html.
  11. Program transfer for answering complex questions over knowledge bases. In Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (eds.), Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, pp.  8128–8140. Association for Computational Linguistics, 2022. doi: 10.18653/v1/2022.acl-long.559. URL https://doi.org/10.18653/v1/2022.acl-long.559.
  12. Palm: Scaling language modeling with pathways, 2022.
  13. Case-based reasoning for natural language queries over knowledge bases, 2021.
  14. Fido: Fusion-in-decoder optimized for stronger performance and faster inference. arXiv preprint arXiv:2212.08153, 2022.
  15. Knowledge prompts: Injecting world knowledge into language models through soft prompts, 2022.
  16. T-rex: A large scale alignment of natural language with knowledge base triples. In Nicoletta Calzolari, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Kôiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asunción Moreno, Jan Odijk, Stelios Piperidis, and Takenobu Tokunaga (eds.), Proceedings of the Eleventh International Conference on Language Resources and Evaluation, LREC 2018, Miyazaki, Japan, May 7-12, 2018. European Language Resources Association (ELRA), 2018. URL http://www.lrec-conf.org/proceedings/lrec2018/summaries/632.html.
  17. Complexity-based prompting for multi-step reasoning. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023. URL https://openreview.net/pdf?id=yf1icZHC-l9.
  18. Re2G: Retrieve, rerank, generate. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp.  2701–2715, Seattle, United States, July 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.naacl-main.194. URL https://aclanthology.org/2022.naacl-main.194.
  19. Beyond I.I.D.: three levels of generalization for question answering on knowledge bases. In Jure Leskovec, Marko Grobelnik, Marc Najork, Jie Tang, and Leila Zia (eds.), WWW ’21: The Web Conference 2021, Virtual Event / Ljubljana, Slovenia, April 19-23, 2021, pp.  3477–3488. ACM / IW3C2, 2021. doi: 10.1145/3442381.3449992. URL https://doi.org/10.1145/3442381.3449992.
  20. Don’t generate, discriminate: A proposal for grounding language models to real-world environments, 2023.
  21. Improving multi-hop knowledge base question answering by learning intermediate supervision signals. In Liane Lewin-Eytan, David Carmel, Elad Yom-Tov, Eugene Agichtein, and Evgeniy Gabrilovich (eds.), WSDM ’21, The Fourteenth ACM International Conference on Web Search and Data Mining, Virtual Event, Israel, March 8-12, 2021, pp.  553–561. ACM, 2021. doi: 10.1145/3437963.3441753. URL https://doi.org/10.1145/3437963.3441753.
  22. A survey of knowledge enhanced pre-trained language models. IEEE Transactions on Knowledge and Data Engineering, 2023.
  23. Structgpt: A general framework for large language model to reason over structured data, 2023.
  24. Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition. Prentice Hall series in artificial intelligence. Prentice Hall, Pearson Education International, 2009. ISBN 9780135041963. URL https://www.worldcat.org/oclc/315913020.
  25. Fie: Building a global probability space by leveraging early fusion in encoder for open-domain question answering. In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (eds.), Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pp.  4246–4260. Association for Computational Linguistics, 2022. doi: 10.18653/v1/2022.emnlp-main.285. URL https://doi.org/10.18653/v1/2022.emnlp-main.285.
  26. Large language models are zero-shot reasoners. In NeurIPS, 2022. URL http://papers.nips.cc/paper_files/paper/2022/hash/8bb0d291acd4acf06ef112099c16f326-Abstract-Conference.html.
  27. Query graph generation for answering multi-hop complex questions from knowledge bases. Association for Computational Linguistics, 2020.
  28. Complex knowledge base question answering: A survey. IEEE Transactions on Knowledge and Data Engineering, 2022.
  29. Retrieval-augmented generation for knowledge-intensive nlp tasks, 2021.
  30. Few-shot in-context learning on knowledge base question answering. In Anna Rogers, Jordan L. Boyd-Graber, and Naoaki Okazaki (eds.), Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, pp.  6966–6980. Association for Computational Linguistics, 2023a. doi: 10.18653/v1/2023.acl-long.385. URL https://doi.org/10.18653/v1/2023.acl-long.385.
  31. Chain of knowledge: A framework for grounding large language models with structured knowledge bases, 2023b.
  32. Uni-parser: Unified semantic parser for question answering on knowledge base and database. arXiv preprint arXiv:2211.05165, 2022.
  33. Roberta: A robustly optimized bert pretraining approach, 2019.
  34. SKILL: structured knowledge infusion for large language models. In Marine Carpuat, Marie-Catherine de Marneffe, and Iván Vladimir Meza Ruíz (eds.), Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022, Seattle, WA, United States, July 10-15, 2022, pp.  1581–1588. Association for Computational Linguistics, 2022. doi: 10.18653/v1/2022.naacl-main.113. URL https://doi.org/10.18653/v1/2022.naacl-main.113.
  35. CREAK: A dataset for commonsense reasoning over entity knowledge. In Joaquin Vanschoren and Sai-Kit Yeung (eds.), Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, December 2021, virtual, 2021. URL https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/5737c6ec2e0716f3d8a7a5c4e0de0d9a-Abstract-round2.html.
  36. OpenAI. Gpt-4 technical report, 2023.
  37. Training language models to follow instructions with human feedback. arXiv Preprint, 2022. doi: 10.48550/arXiv.2203.02155. URL https://doi.org/10.48550/arXiv.2203.02155.
  38. Unifying large language models and knowledge graphs: A roadmap. arXiv preprint arXiv:2306.08302, 2023.
  39. Qald-9-plus: A multilingual dataset for question answering over dbpedia and wikidata translated by native speakers. In 2022 IEEE 16th International Conference on Semantic Computing (ICSC), pp.  229–234, Los Alamitos, CA, USA, jan 2022. IEEE Computer Society. doi: 10.1109/ICSC52841.2022.00045. URL https://doi.ieeecomputersociety.org/10.1109/ICSC52841.2022.00045.
  40. Knowledge enhanced contextual word representations. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp.  43–54, Hong Kong, China, November 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1005. URL https://aclanthology.org/D19-1005.
  41. Kilt: a benchmark for knowledge intensive language tasks, 2021.
  42. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
  43. SPARQL-QA enters the QALD challenge. In Xi Yan, Meriem Beloucif, and Ricardo Usbeck (eds.), Proceedings of the 7th Natural Language Interfaces for the Web of Data (NLIWoD) co-located with the 19th European Semantic Web Conference (ESWC 2022), Hersonissos, Greece, May 29th, 2022, volume 3196 of CEUR Workshop Proceedings, pp.  25–31. CEUR-WS.org, 2022. URL https://ceur-ws.org/Vol-3196/paper3.pdf.
  44. TIARA: Multi-grained retrieval for robust question answering over large knowledge base. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp.  8108–8121, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.emnlp-main.555. URL https://aclanthology.org/2022.emnlp-main.555.
  45. Pullnet: Open domain question answering with iterative retrieval on knowledge bases and text, 2019.
  46. Beamsearchqa: Large language models are strong zero-shot QA solver. CoRR, abs/2305.14766, 2023a. doi: 10.48550/arXiv.2305.14766. URL https://doi.org/10.48550/arXiv.2305.14766.
  47. Enhancing chain-of-thoughts prompting with iterative bootstrapping in large language models, 2023b.
  48. The web as a knowledge-base for answering complex questions. In Marilyn A. Walker, Heng Ji, and Amanda Stent (eds.), Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 1 (Long Papers), pp.  641–651. Association for Computational Linguistics, 2018. doi: 10.18653/v1/n18-1059. URL https://doi.org/10.18653/v1/n18-1059.
  49. Commonsenseqa: A question answering challenge targeting commonsense knowledge. In Jill Burstein, Christy Doran, and Thamar Solorio (eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp.  4149–4158, 2019. doi: 10.18653/v1/n19-1421. URL https://doi.org/10.18653/v1/n19-1421.
  50. Evaluation of chatgpt as a question answering system for answering complex questions. arXiv preprint arXiv:2303.07992, 2023.
  51. Lamda: Language models for dialog applications. CoRR, 2022. URL https://arxiv.org/abs/2201.08239.
  52. Llama 2: Open foundation and fine-tuned chat models, 2023.
  53. Wikidata: A free collaborative knowledgebase. Commun. ACM, 57(10):78–85, sep 2014. ISSN 0001-0782. doi: 10.1145/2629489. URL https://doi.org/10.1145/2629489.
  54. Boosting language models reasoning with chain-of-knowledge prompting, 2023a.
  55. Knowledge-driven cot: Exploring faithful reasoning in llms for knowledge-intensive question answering, 2023b.
  56. K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp.  1405–1418, Online, August 2021a. Association for Computational Linguistics. doi: 10.18653/v1/2021.findings-acl.121. URL https://aclanthology.org/2021.findings-acl.121.
  57. Kepler: A unified model for knowledge embedding and pre-trained language representation. Transactions of the Association for Computational Linguistics, 9:176–194, 2021b.
  58. Self-consistency improves chain of thought reasoning in language models. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023c. URL https://openreview.net/pdf?id=1PL1NIMMrw.
  59. Chain of thought prompting elicits reasoning in large language models. arXiv Preprint, 2022. URL https://arxiv.org/abs/2201.11903.
  60. UnifiedSKG: Unifying and multi-tasking structured knowledge grounding with text-to-text language models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp.  602–631, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics. URL https://aclanthology.org/2022.emnlp-main.39.
  61. Decomposition enhances reasoning via self-evaluation guided decoding, 2023.
  62. LUKE: Deep contextualized entity representations with entity-aware self-attention. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.  6442–6454, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.523. URL https://aclanthology.org/2020.emnlp-main.523.
  63. Chatgpt is not enough: Enhancing large language models with knowledge graphs for fact-aware language modeling, 2023.
  64. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629, 2022.
  65. Tree of thoughts: Deliberate problem solving with large language models, 2023.
  66. The value of semantic parse labeling for knowledge base question answering. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 2: Short Papers. The Association for Computer Linguistics, 2016. doi: 10.18653/v1/p16-2033. URL https://doi.org/10.18653/v1/p16-2033.
  67. Decaf: Joint decoding of answers and logical forms for question answering over knowledge bases, 2023.
  68. Retrieval augmentation for commonsense reasoning: A unified approach. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp.  4364–4377, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.emnlp-main.294. URL https://aclanthology.org/2022.emnlp-main.294.
  69. Ernie: Enhanced language representation with informative entities, 2019.
  70. Automatic chain of thought prompting in large language models. arXiv Preprint, 2022. doi: 10.48550/arXiv.2210.03493. URL https://doi.org/10.48550/arXiv.2210.03493.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Jiashuo Sun (11 papers)
  2. Chengjin Xu (36 papers)
  3. Lumingyuan Tang (4 papers)
  4. Saizhuo Wang (16 papers)
  5. Chen Lin (75 papers)
  6. Yeyun Gong (78 papers)
  7. Lionel M. Ni (20 papers)
  8. Heung-Yeung Shum (32 papers)
  9. Jian Guo (76 papers)
Citations (40)
X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com