Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Prompting Large Language Models with Knowledge Graphs for Question Answering Involving Long-tail Facts (2405.06524v1)

Published 10 May 2024 in cs.CL

Abstract: Although LLMs are effective in performing various NLP tasks, they still struggle to handle tasks that require extensive, real-world knowledge, especially when dealing with long-tail facts (facts related to long-tail entities). This limitation highlights the need to supplement LLMs with non-parametric knowledge. To address this issue, we analysed the effects of different types of non-parametric knowledge, including textual passage and knowledge graphs (KGs). Since LLMs have probably seen the majority of factual question-answering datasets already, to facilitate our analysis, we proposed a fully automatic pipeline for creating a benchmark that requires knowledge of long-tail facts for answering the involved questions. Using this pipeline, we introduce the LTGen benchmark. We evaluate state-of-the-art LLMs in different knowledge settings using the proposed benchmark. Our experiments show that LLMs alone struggle with answering these questions, especially when the long-tail level is high or rich knowledge is required. Nonetheless, the performance of the same models improved significantly when they were prompted with non-parametric knowledge. We observed that, in most cases, prompting LLMs with KG triples surpasses passage-based prompting using a state-of-the-art retriever. In addition, while prompting LLMs with both KG triples and documents does not consistently improve knowledge coverage, it can dramatically reduce hallucinations in the generated content.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. OpenAI, GPT-4 technical report, CoRR abs/2303.08774 (2023). URL: https://doi.org/10.48550/arXiv.2303.08774. doi:10.48550/arXiv.2303.08774. arXiv:2303.08774.
  2. Llama: Open and efficient foundation language models, CoRR abs/2302.13971 (2023a). URL: https://doi.org/10.48550/arXiv.2302.13971. doi:10.48550/arXiv.2302.13971. arXiv:2302.13971.
  3. Llama 2: Open foundation and fine-tuned chat models, CoRR abs/2307.09288 (2023b). URL: https://doi.org/10.48550/arXiv.2307.09288. doi:10.48550/ARXIV.2307.09288. arXiv:2307.09288.
  4. Palm 2 technical report, CoRR abs/2305.10403 (2023). URL: https://doi.org/10.48550/arXiv.2305.10403. doi:10.48550/arXiv.2305.10403. arXiv:2305.10403.
  5. KILT: a benchmark for knowledge intensive language tasks, in: K. Toutanova, A. Rumshisky, L. Zettlemoyer, D. Hakkani-Tur, I. Beltagy, S. Bethard, R. Cotterell, T. Chakraborty, Y. Zhou (Eds.), Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, Online, 2021, pp. 2523–2544. URL: https://aclanthology.org/2021.naacl-main.200. doi:10.18653/v1/2021.naacl-main.200.
  6. When not to trust language models: Investigating effectiveness of parametric and non-parametric memories, in: A. Rogers, J. Boyd-Graber, N. Okazaki (Eds.), Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Toronto, Canada, 2023, pp. 9802–9822. URL: https://aclanthology.org/2023.acl-long.546. doi:10.18653/v1/2023.acl-long.546.
  7. Check your facts and try again: Improving large language models with external knowledge and automated feedback, CoRR abs/2302.12813 (2023). URL: https://doi.org/10.48550/arXiv.2302.12813. doi:10.48550/arXiv.2302.12813. arXiv:2302.12813.
  8. D. Vrandecic, M. Krötzsch, Wikidata: a free collaborative knowledgebase, Commun. ACM 57 (2014) 78–85. URL: https://doi.org/10.1145/2629489. doi:10.1145/2629489.
  9. KQA pro: A dataset with explicit compositional programs for complex question answering over knowledge base, in: S. Muresan, P. Nakov, A. Villavicencio (Eds.), Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Dublin, Ireland, 2022, pp. 6101–6119. URL: https://aclanthology.org/2022.acl-long.422. doi:10.18653/v1/2022.acl-long.422.
  10. HyperFormer: Enhancing Entity and Relation Interaction for Hyper-Relational Knowledge Graph Completion, in: Proc. of the 32nd ACM International Conference on Information and Knowledge Management (CIKM 2023), 2023a, pp. 803–812. URL: https://arxiv.org/abs/2308.06512. doi:10.1145/3583780.3614922.
  11. An Empirical Study of Pre-trained Language Models in Simple Knowledge Graph Question Answering, Journal of World Wide Web: Internet and Web Information Systems 26 (2023b) 2855–2886. URL: https://arxiv.org/abs/2303.10368. doi:10.48550/arXiv.2303.10368.
  12. Semantic parsing for conversational question answering over knowledge graphs, in: EACL, 2023, pp. 2499–2514. URL: https://aclanthology.org/2023.eacl-main.184/. doi:10.18653/v1/2023.eacl-main.184.
  13. A survey on complex knowledge base question answering: Methods, challenges and solutions, in: Z. Zhou (Ed.), Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI 2021, Virtual Event / Montreal, Canada, 19-27 August 2021, ijcai.org, 2021, pp. 4483–4491. URL: https://doi.org/10.24963/ijcai.2021/611. doi:10.24963/IJCAI.2021/611.
  14. Conversational question answering: a survey., Knowl. Inf. Syst. 64 (2022) 3151–3195. URL: https://link.springer.com/content/pdf/10.1007/s10115-022-01744-y.pdf. doi:0.1007/s10115-022-01744-y.
  15. FastRAT: Fast and Efficient Cross-lingual Text-to-SQL Semantic Parsing, in: Proc. of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (JCNLP-AACL 2023), 2023, pp. 564–576. URL: https://aclanthology.org/2023.ijcnlp-main.38.pdf. doi:10.18653/v1/2023.ijcnlp-main.38.
  16. Archer: A Human-Labeled Text-to-SQL Dataset with Arithmetic, Commonsense and Hypothetical Reasoning, in: Proc. of the 18th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2024), 2024, p. 94–111. URL: https://aclanthology.org/2024.eacl-long.6.pdf. doi:10.48550/arXiv.2402.12554.
  17. Leveraging Abstract Meaning Representation for knowledge base question answering, in: C. Zong, F. Xia, W. Li, R. Navigli (Eds.), Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Association for Computational Linguistics, Online, 2021, pp. 3884–3894. URL: https://aclanthology.org/2021.findings-acl.339. doi:10.18653/v1/2021.findings-acl.339.
  18. A survey on complex factual question answering, AI Open 4 (2023) 1–12. URL: https://www.sciencedirect.com/science/article/pii/S2666651022000249. doi:https://doi.org/10.1016/j.aiopen.2022.12.003.
  19. Re2G: Retrieve, rerank, generate, in: M. Carpuat, M.-C. de Marneffe, I. V. Meza Ruiz (Eds.), Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, Seattle, United States, 2022, pp. 2701–2715. URL: https://aclanthology.org/2022.naacl-main.194. doi:10.18653/v1/2022.naacl-main.194.
  20. Large Language Models and Knowledge Graphs: Opportunities and Challenges, Special Issue on Trends in Graph Data and Knowledge. Transactions on Graph Data and Knowledge (TGDK) 1 (2023) 1–38. URL: https://drops.dagstuhl.de/entities/document/10.4230/TGDK.1.1.2. doi:10.4230/TGDK.1.1.2.
  21. Language models as knowledge bases?, in: K. Inui, J. Jiang, V. Ng, X. Wan (Eds.), Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Association for Computational Linguistics, Hong Kong, China, 2019, pp. 2463–2473. URL: https://aclanthology.org/D19-1250. doi:10.18653/v1/D19-1250.
  22. Natural questions: A benchmark for question answering research, Transactions of the Association for Computational Linguistics 7 (2019) 452–466. URL: https://aclanthology.org/Q19-1026. doi:10.1162/tacl_a_00276.
  23. TriviaQA: A large scale distantly supervised challenge dataset for reading comprehension, in: R. Barzilay, M.-Y. Kan (Eds.), Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Vancouver, Canada, 2017, pp. 1601–1611. URL: https://aclanthology.org/P17-1147. doi:10.18653/v1/P17-1147.
  24. MS MARCO: A human generated machine reading comprehension dataset, in: T. R. Besold, A. Bordes, A. S. d’Avila Garcez, G. Wayne (Eds.), Proceedings of the Workshop on Cognitive Computation: Integrating neural and symbolic approaches 2016 co-located with the 30th Annual Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain, December 9, 2016, volume 1773 of CEUR Workshop Proceedings, CEUR-WS.org, 2016. URL: https://ceur-ws.org/Vol-1773/CoCoNIPS_2016_paper9.pdf.
  25. Wizard of wikipedia: Knowledge-powered conversational agents, in: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, OpenReview.net, 2019. URL: https://openreview.net/forum?id=r1l73iRqKm.
  26. Self-instruct: Aligning language model with self generated instructions, CoRR abs/2212.10560 (2022). URL: https://doi.org/10.48550/arXiv.2212.10560. doi:10.48550/arXiv.2212.10560. arXiv:2212.10560.
  27. Instruction tuning with GPT-4, CoRR abs/2304.03277 (2023). URL: https://doi.org/10.48550/arXiv.2304.03277. doi:10.48550/arXiv.2304.03277. arXiv:2304.03277.
  28. T-REx: A large scale alignment of natural language with knowledge base triples, in: N. Calzolari, K. Choukri, C. Cieri, T. Declerck, S. Goggi, K. Hasida, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, S. Piperidis, T. Tokunaga (Eds.), Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), European Language Resources Association (ELRA), Miyazaki, Japan, 2018. URL: https://aclanthology.org/L18-1544.
  29. Unsupervised dense information retrieval with contrastive learning, Trans. Mach. Learn. Res. 2022 (2022). URL: https://openreview.net/forum?id=jKN1pXi7b0.
  30. Retrieval, re-ranking and multi-task learning for knowledge-base question answering, in: P. Merlo, J. Tiedemann, R. Tsarfaty (Eds.), Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, Association for Computational Linguistics, Online, 2021, pp. 347–357. URL: https://aclanthology.org/2021.eacl-main.26. doi:10.18653/v1/2021.eacl-main.26.
  31. P. Ferragina, U. Scaiella, TAGME: on-the-fly annotation of short text fragments (by wikipedia entities), in: J. X. Huang, N. Koudas, G. J. F. Jones, X. Wu, K. Collins-Thompson, A. An (Eds.), Proceedings of the 19th ACM Conference on Information and Knowledge Management, CIKM 2010, Toronto, Ontario, Canada, October 26-30, 2010, ACM, 2010, pp. 1625–1628. URL: https://doi.org/10.1145/1871437.1871689. doi:10.1145/1871437.1871689.
  32. Abstract Meaning Representation for sembanking, in: A. Pareja-Lora, M. Liakata, S. Dipper (Eds.), Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse, Association for Computational Linguistics, Sofia, Bulgaria, 2013, pp. 178–186. URL: https://aclanthology.org/W13-2322.
  33. A semantics-aware transformer model of relation linking for knowledge base question answering, in: C. Zong, F. Xia, W. Li, R. Navigli (Eds.), Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), Association for Computational Linguistics, Online, 2021, pp. 256–262. URL: https://aclanthology.org/2021.acl-short.34. doi:10.18653/v1/2021.acl-short.34.
  34. P. Kingsbury, M. Palmer, From TreeBank to PropBank, in: M. González Rodríguez, C. P. Suarez Araujo (Eds.), Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02), European Language Resources Association (ELRA), Las Palmas, Canary Islands - Spain, 2002. URL: http://www.lrec-conf.org/proceedings/lrec2002/pdf/283.pdf.
  35. BERT: Pre-training of deep bidirectional transformers for language understanding, in: J. Burstein, C. Doran, T. Solorio (Eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics, Minneapolis, Minnesota, 2019, pp. 4171–4186. URL: https://aclanthology.org/N19-1423. doi:10.18653/v1/N19-1423.
  36. One SPRING to rule them both: Symmetric AMR semantic parsing and generation without a complex pipeline, in: Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021, AAAI Press, 2021, pp. 12564–12573. URL: https://ojs.aaai.org/index.php/AAAI/article/view/17489.
  37. Scalable zero-shot entity linking with dense entity retrieval, in: B. Webber, T. Cohn, Y. He, Y. Liu (Eds.), Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Online, 2020, pp. 6397–6407. URL: https://aclanthology.org/2020.emnlp-main.519. doi:10.18653/v1/2020.emnlp-main.519.
  38. Graph attention networks, CoRR abs/1710.10903 (2017). URL: http://arxiv.org/abs/1710.10903. arXiv:1710.10903.
  39. Self-rag: Learning to retrieve, generate, and critique through self-reflection, CoRR abs/2310.11511 (2023). URL: https://doi.org/10.48550/arXiv.2310.11511. doi:10.48550/ARXIV.2310.11511. arXiv:2310.11511.
  40. Efficient memory management for large language model serving with pagedattention, in: J. Flinn, M. I. Seltzer, P. Druschel, A. Kaufmann, J. Mace (Eds.), Proceedings of the 29th Symposium on Operating Systems Principles, SOSP 2023, Koblenz, Germany, October 23-26, 2023, ACM, 2023, pp. 611–626. URL: https://doi.org/10.1145/3600006.3613165. doi:10.1145/3600006.3613165.
  41. News summarization and evaluation in the era of GPT-3, CoRR abs/2209.12356 (2022). URL: https://doi.org/10.48550/arXiv.2209.12356. doi:10.48550/arXiv.2209.12356. arXiv:2209.12356.
  42. Bleu: a method for automatic evaluation of machine translation, in: P. Isabelle, E. Charniak, D. Lin (Eds.), Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, 2002, pp. 311–318. URL: https://aclanthology.org/P02-1040. doi:10.3115/1073083.1073135.
  43. C.-Y. Lin, ROUGE: A package for automatic evaluation of summaries, in: Text Summarization Branches Out, Association for Computational Linguistics, Barcelona, Spain, 2004, pp. 74–81. URL: https://aclanthology.org/W04-1013.
  44. Can NLI models verify QA systems’ predictions?, in: M.-F. Moens, X. Huang, L. Specia, S. W.-t. Yih (Eds.), Findings of the Association for Computational Linguistics: EMNLP 2021, Association for Computational Linguistics, Punta Cana, Dominican Republic, 2021, pp. 3841–3854. URL: https://aclanthology.org/2021.findings-emnlp.324. doi:10.18653/v1/2021.findings-emnlp.324.
  45. Y. Chen, S. Eger, MENLI: robust evaluation metrics from natural language inference, CoRR abs/2208.07316 (2022). URL: https://doi.org/10.48550/arXiv.2208.07316. doi:10.48550/arXiv.2208.07316. arXiv:2208.07316.
  46. Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing, CoRR abs/2111.09543 (2021). URL: https://arxiv.org/abs/2111.09543. arXiv:2111.09543.
  47. Knowledge-augmented language model verification, in: H. Bouamor, J. Pino, K. Bali (Eds.), Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Singapore, 2023, pp. 1720–1736. URL: https://aclanthology.org/2023.emnlp-main.107. doi:10.18653/v1/2023.emnlp-main.107.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Wenyu Huang (7 papers)
  2. Guancheng Zhou (3 papers)
  3. Mirella Lapata (135 papers)
  4. Pavlos Vougiouklis (11 papers)
  5. Sebastien Montella (6 papers)
  6. Jeff Z. Pan (78 papers)
Citations (1)
X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com