Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Right for Right Reasons: Large Language Models for Verifiable Commonsense Knowledge Graph Question Answering (2403.01390v2)

Published 3 Mar 2024 in cs.CL

Abstract: Knowledge Graph Question Answering (KGQA) methods seek to answer Natural Language questions using the relational information stored in Knowledge Graphs (KGs). With the recent advancements of LLMs and their remarkable reasoning abilities, there is a growing trend to leverage them for KGQA. However, existing methodologies have only focused on answering factual questions, e.g., "In which city was Silvio Berlusconi's first wife born?", leaving questions involving commonsense reasoning that real-world users may pose more often, e.g., "Do I need separate visas to see the Venus of Willendorf and attend the Olympics this summer?" unaddressed. In this work, we first observe that existing LLM-based methods for KGQA struggle with hallucination on such questions, especially on queries targeting long-tail entities (e.g., non-mainstream and recent entities), thus hindering their applicability in real-world applications especially since their reasoning processes are not easily verifiable. In response, we propose Right for Right Reasons (R3), a commonsense KGQA methodology that allows for a verifiable reasoning procedure by axiomatically surfacing intrinsic commonsense knowledge of LLMs and grounding every factual reasoning step on KG triples. Through experimental evaluations across three different tasks--question answering, claim verification, and preference matching--our findings showcase R3 as a superior approach, outperforming existing methodologies and notably reducing instances of hallucination and reasoning errors.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. Ask me anything: A simple strategy for prompting language models. arXiv preprint arXiv:2210.02441.
  2. Refined: An efficient zero-shot-capable approach to end-to-end entity linking. arXiv preprint arXiv:2207.04108.
  3. Knowledge-augmented language model prompting for zero-shot knowledge graph question answering. arXiv preprint arXiv:2306.04136.
  4. Semantic parsing on freebase from question-answer pairs. In Proceedings of the 2013 conference on empirical methods in natural language processing, pages 1533–1544.
  5. Semantic parsing on freebase from question-answer pairs. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, Grand Hyatt Seattle, Seattle, Washington, USA, A meeting of SIGDAT, a Special Interest Group of the ACL, pages 1533–1544. ACL.
  6. Chatgpt is a knowledgeable but inexperienced solver: An investigation of commonsense problem in large language models. arXiv preprint arXiv:2303.16421.
  7. Piqa: Reasoning about physical commonsense in natural language. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 7432–7439.
  8. Protoqa: A question answering dataset for prototypical common-sense reasoning. arXiv preprint arXiv:2005.00771.
  9. A survey on evaluation of large language models. ACM Transactions on Intelligent Systems and Technology.
  10. Retrack: A flexible and efficient framework for knowledge base question answering. In Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing: system demonstrations, pages 325–336.
  11. Binding language models in symbolic languages. arXiv preprint arXiv:2210.02875.
  12. The future landscape of large language models in medicine. Communications Medicine, 3(1):141.
  13. Did aristotle use a laptop? a question answering benchmark with implicit reasoning strategies. Transactions of the Association for Computational Linguistics, 9:346–361.
  14. Did aristotle use a laptop? A question answering benchmark with implicit reasoning strategies. Trans. Assoc. Comput. Linguistics, 9:346–361.
  15. Beyond I.I.D.: three levels of generalization for question answering on knowledge bases. In WWW ’21: The Web Conference 2021, Virtual Event / Ljubljana, Slovenia, April 19-23, 2021, pages 3477–3488. ACM / IW3C2.
  16. Yu Gu and Yu Su. 2022. Arcaneqa: Dynamic program induction and contextualized encoding for knowledge base question answering. arXiv preprint arXiv:2204.08109.
  17. Multi-hop commonsense knowledge injection framework for zero-shot commonsense question answering. arXiv preprint arXiv:2305.05936.
  18. Mitigating large language model hallucinations via autonomous knowledge graph-based retrofitting. arXiv preprint arXiv:2311.13314.
  19. Dense passage retrieval for open-domain question answering. arXiv preprint arXiv:2004.04906.
  20. Rql: a declarative query language for rdf. In Proceedings of the 11th international conference on World Wide Web, pages 592–603.
  21. Large language models are zero-shot reasoners. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022.
  22. A survey on complex knowledge base question answering: Methods, challenges and solutions. arXiv preprint arXiv:2105.11644.
  23. Efficient one-pass end-to-end entity linking for questions. arXiv preprint arXiv:2010.02413.
  24. Few-shot in-context learning for knowledge base question answering. arXiv preprint arXiv:2305.01750.
  25. Evaluating the logical reasoning ability of chatgpt and gpt-4. arXiv preprint arXiv:2304.03439.
  26. Generated knowledge prompting for commonsense reasoning. arXiv preprint arXiv:2110.08387.
  27. Factscore: Fine-grained atomic evaluation of factual precision in long form text generation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023, pages 12076–12100. Association for Computational Linguistics.
  28. CREAK: A dataset for commonsense reasoning over entity knowledge. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, December 2021, virtual.
  29. Creak: A dataset for commonsense reasoning over entity knowledge. arXiv preprint arXiv:2109.01653.
  30. Eric Prud’hommeaux and Andy Seaborne. 2008. Sparql query language for rdf. w3c recommendation. http://www.w3.org/TR/rdf-sparql-query/.
  31. Robots that ask for help: Uncertainty alignment for large language model planners. arXiv preprint arXiv:2307.01928.
  32. Socialiqa: Commonsense reasoning about social interactions. arXiv preprint arXiv:1904.09728.
  33. An experimental study measuring the generalization of fine-tuned language representation models across commonsense reasoning benchmarks. Expert Systems, page e13243.
  34. Tiara: Multi-grained retrieval for robust question answering over large knowledge bases. arXiv preprint arXiv:2210.12925.
  35. Llm-planner: Few-shot grounded planning for embodied agents with large language models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2998–3009.
  36. Mpnet: Masked and permuted pre-training for language understanding. Advances in Neural Information Processing Systems, 33:16857–16867.
  37. Paradigm shift in natural language processing. Machine Intelligence Research, 19(3):169–183.
  38. Commonsenseqa: A question answering challenge targeting commonsense knowledge. arXiv preprint arXiv:1811.00937.
  39. A comprehensive survey of hallucination mitigation techniques in large language models. arXiv preprint arXiv:2401.01313.
  40. Lc-quad: A corpus for complex question answering over knowledge graphs. In The Semantic Web - ISWC 2017 - 16th International Semantic Web Conference, Vienna, Austria, October 21-25, 2017, Proceedings, Part II, volume 10588 of Lecture Notes in Computer Science, pages 210–218. Springer.
  41. Car: Conceptualization-augmented reasoner for zero-shot commonsense question answering. arXiv preprint arXiv:2305.14869.
  42. Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022.
  43. Approximate nearest neighbor negative contrastive learning for dense text retrieval. arXiv preprint arXiv:2007.00808.
  44. Cognitive mirage: A review of hallucinations in large language models. arXiv preprint arXiv:2309.06794.
  45. The value of semantic parse labeling for knowledge base question answering. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 201–206.
  46. How well do large language models perform in arithmetic tasks? arXiv preprint arXiv:2304.02015.
  47. Recipe-mpr: A test collection for evaluating multi-aspect preference-based natural language retrieval. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2744–2753.
  48. Natural language question/answering: Let users talk with the knowledge graph. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pages 217–226.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Armin Toroghi (8 papers)
  2. Willis Guo (3 papers)
  3. Mohammad Mahdi Abdollah Pour (3 papers)
  4. Scott Sanner (70 papers)
Citations (3)