Prompting Large Language Models with Knowledge Graphs for Question Answering Involving Long-tail Facts (2405.06524v1)
Abstract: Although LLMs are effective in performing various NLP tasks, they still struggle to handle tasks that require extensive, real-world knowledge, especially when dealing with long-tail facts (facts related to long-tail entities). This limitation highlights the need to supplement LLMs with non-parametric knowledge. To address this issue, we analysed the effects of different types of non-parametric knowledge, including textual passage and knowledge graphs (KGs). Since LLMs have probably seen the majority of factual question-answering datasets already, to facilitate our analysis, we proposed a fully automatic pipeline for creating a benchmark that requires knowledge of long-tail facts for answering the involved questions. Using this pipeline, we introduce the LTGen benchmark. We evaluate state-of-the-art LLMs in different knowledge settings using the proposed benchmark. Our experiments show that LLMs alone struggle with answering these questions, especially when the long-tail level is high or rich knowledge is required. Nonetheless, the performance of the same models improved significantly when they were prompted with non-parametric knowledge. We observed that, in most cases, prompting LLMs with KG triples surpasses passage-based prompting using a state-of-the-art retriever. In addition, while prompting LLMs with both KG triples and documents does not consistently improve knowledge coverage, it can dramatically reduce hallucinations in the generated content.
- OpenAI, GPT-4 technical report, CoRR abs/2303.08774 (2023). URL: https://doi.org/10.48550/arXiv.2303.08774. doi:10.48550/arXiv.2303.08774. arXiv:2303.08774.
- Llama: Open and efficient foundation language models, CoRR abs/2302.13971 (2023a). URL: https://doi.org/10.48550/arXiv.2302.13971. doi:10.48550/arXiv.2302.13971. arXiv:2302.13971.
- Llama 2: Open foundation and fine-tuned chat models, CoRR abs/2307.09288 (2023b). URL: https://doi.org/10.48550/arXiv.2307.09288. doi:10.48550/ARXIV.2307.09288. arXiv:2307.09288.
- Palm 2 technical report, CoRR abs/2305.10403 (2023). URL: https://doi.org/10.48550/arXiv.2305.10403. doi:10.48550/arXiv.2305.10403. arXiv:2305.10403.
- KILT: a benchmark for knowledge intensive language tasks, in: K. Toutanova, A. Rumshisky, L. Zettlemoyer, D. Hakkani-Tur, I. Beltagy, S. Bethard, R. Cotterell, T. Chakraborty, Y. Zhou (Eds.), Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, Online, 2021, pp. 2523–2544. URL: https://aclanthology.org/2021.naacl-main.200. doi:10.18653/v1/2021.naacl-main.200.
- When not to trust language models: Investigating effectiveness of parametric and non-parametric memories, in: A. Rogers, J. Boyd-Graber, N. Okazaki (Eds.), Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Toronto, Canada, 2023, pp. 9802–9822. URL: https://aclanthology.org/2023.acl-long.546. doi:10.18653/v1/2023.acl-long.546.
- Check your facts and try again: Improving large language models with external knowledge and automated feedback, CoRR abs/2302.12813 (2023). URL: https://doi.org/10.48550/arXiv.2302.12813. doi:10.48550/arXiv.2302.12813. arXiv:2302.12813.
- D. Vrandecic, M. Krötzsch, Wikidata: a free collaborative knowledgebase, Commun. ACM 57 (2014) 78–85. URL: https://doi.org/10.1145/2629489. doi:10.1145/2629489.
- KQA pro: A dataset with explicit compositional programs for complex question answering over knowledge base, in: S. Muresan, P. Nakov, A. Villavicencio (Eds.), Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Dublin, Ireland, 2022, pp. 6101–6119. URL: https://aclanthology.org/2022.acl-long.422. doi:10.18653/v1/2022.acl-long.422.
- HyperFormer: Enhancing Entity and Relation Interaction for Hyper-Relational Knowledge Graph Completion, in: Proc. of the 32nd ACM International Conference on Information and Knowledge Management (CIKM 2023), 2023a, pp. 803–812. URL: https://arxiv.org/abs/2308.06512. doi:10.1145/3583780.3614922.
- An Empirical Study of Pre-trained Language Models in Simple Knowledge Graph Question Answering, Journal of World Wide Web: Internet and Web Information Systems 26 (2023b) 2855–2886. URL: https://arxiv.org/abs/2303.10368. doi:10.48550/arXiv.2303.10368.
- Semantic parsing for conversational question answering over knowledge graphs, in: EACL, 2023, pp. 2499–2514. URL: https://aclanthology.org/2023.eacl-main.184/. doi:10.18653/v1/2023.eacl-main.184.
- A survey on complex knowledge base question answering: Methods, challenges and solutions, in: Z. Zhou (Ed.), Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI 2021, Virtual Event / Montreal, Canada, 19-27 August 2021, ijcai.org, 2021, pp. 4483–4491. URL: https://doi.org/10.24963/ijcai.2021/611. doi:10.24963/IJCAI.2021/611.
- Conversational question answering: a survey., Knowl. Inf. Syst. 64 (2022) 3151–3195. URL: https://link.springer.com/content/pdf/10.1007/s10115-022-01744-y.pdf. doi:0.1007/s10115-022-01744-y.
- FastRAT: Fast and Efficient Cross-lingual Text-to-SQL Semantic Parsing, in: Proc. of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (JCNLP-AACL 2023), 2023, pp. 564–576. URL: https://aclanthology.org/2023.ijcnlp-main.38.pdf. doi:10.18653/v1/2023.ijcnlp-main.38.
- Archer: A Human-Labeled Text-to-SQL Dataset with Arithmetic, Commonsense and Hypothetical Reasoning, in: Proc. of the 18th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2024), 2024, p. 94–111. URL: https://aclanthology.org/2024.eacl-long.6.pdf. doi:10.48550/arXiv.2402.12554.
- Leveraging Abstract Meaning Representation for knowledge base question answering, in: C. Zong, F. Xia, W. Li, R. Navigli (Eds.), Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Association for Computational Linguistics, Online, 2021, pp. 3884–3894. URL: https://aclanthology.org/2021.findings-acl.339. doi:10.18653/v1/2021.findings-acl.339.
- A survey on complex factual question answering, AI Open 4 (2023) 1–12. URL: https://www.sciencedirect.com/science/article/pii/S2666651022000249. doi:https://doi.org/10.1016/j.aiopen.2022.12.003.
- Re2G: Retrieve, rerank, generate, in: M. Carpuat, M.-C. de Marneffe, I. V. Meza Ruiz (Eds.), Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, Seattle, United States, 2022, pp. 2701–2715. URL: https://aclanthology.org/2022.naacl-main.194. doi:10.18653/v1/2022.naacl-main.194.
- Large Language Models and Knowledge Graphs: Opportunities and Challenges, Special Issue on Trends in Graph Data and Knowledge. Transactions on Graph Data and Knowledge (TGDK) 1 (2023) 1–38. URL: https://drops.dagstuhl.de/entities/document/10.4230/TGDK.1.1.2. doi:10.4230/TGDK.1.1.2.
- Language models as knowledge bases?, in: K. Inui, J. Jiang, V. Ng, X. Wan (Eds.), Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Association for Computational Linguistics, Hong Kong, China, 2019, pp. 2463–2473. URL: https://aclanthology.org/D19-1250. doi:10.18653/v1/D19-1250.
- Natural questions: A benchmark for question answering research, Transactions of the Association for Computational Linguistics 7 (2019) 452–466. URL: https://aclanthology.org/Q19-1026. doi:10.1162/tacl_a_00276.
- TriviaQA: A large scale distantly supervised challenge dataset for reading comprehension, in: R. Barzilay, M.-Y. Kan (Eds.), Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Vancouver, Canada, 2017, pp. 1601–1611. URL: https://aclanthology.org/P17-1147. doi:10.18653/v1/P17-1147.
- MS MARCO: A human generated machine reading comprehension dataset, in: T. R. Besold, A. Bordes, A. S. d’Avila Garcez, G. Wayne (Eds.), Proceedings of the Workshop on Cognitive Computation: Integrating neural and symbolic approaches 2016 co-located with the 30th Annual Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain, December 9, 2016, volume 1773 of CEUR Workshop Proceedings, CEUR-WS.org, 2016. URL: https://ceur-ws.org/Vol-1773/CoCoNIPS_2016_paper9.pdf.
- Wizard of wikipedia: Knowledge-powered conversational agents, in: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, OpenReview.net, 2019. URL: https://openreview.net/forum?id=r1l73iRqKm.
- Self-instruct: Aligning language model with self generated instructions, CoRR abs/2212.10560 (2022). URL: https://doi.org/10.48550/arXiv.2212.10560. doi:10.48550/arXiv.2212.10560. arXiv:2212.10560.
- Instruction tuning with GPT-4, CoRR abs/2304.03277 (2023). URL: https://doi.org/10.48550/arXiv.2304.03277. doi:10.48550/arXiv.2304.03277. arXiv:2304.03277.
- T-REx: A large scale alignment of natural language with knowledge base triples, in: N. Calzolari, K. Choukri, C. Cieri, T. Declerck, S. Goggi, K. Hasida, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, S. Piperidis, T. Tokunaga (Eds.), Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), European Language Resources Association (ELRA), Miyazaki, Japan, 2018. URL: https://aclanthology.org/L18-1544.
- Unsupervised dense information retrieval with contrastive learning, Trans. Mach. Learn. Res. 2022 (2022). URL: https://openreview.net/forum?id=jKN1pXi7b0.
- Retrieval, re-ranking and multi-task learning for knowledge-base question answering, in: P. Merlo, J. Tiedemann, R. Tsarfaty (Eds.), Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, Association for Computational Linguistics, Online, 2021, pp. 347–357. URL: https://aclanthology.org/2021.eacl-main.26. doi:10.18653/v1/2021.eacl-main.26.
- P. Ferragina, U. Scaiella, TAGME: on-the-fly annotation of short text fragments (by wikipedia entities), in: J. X. Huang, N. Koudas, G. J. F. Jones, X. Wu, K. Collins-Thompson, A. An (Eds.), Proceedings of the 19th ACM Conference on Information and Knowledge Management, CIKM 2010, Toronto, Ontario, Canada, October 26-30, 2010, ACM, 2010, pp. 1625–1628. URL: https://doi.org/10.1145/1871437.1871689. doi:10.1145/1871437.1871689.
- Abstract Meaning Representation for sembanking, in: A. Pareja-Lora, M. Liakata, S. Dipper (Eds.), Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse, Association for Computational Linguistics, Sofia, Bulgaria, 2013, pp. 178–186. URL: https://aclanthology.org/W13-2322.
- A semantics-aware transformer model of relation linking for knowledge base question answering, in: C. Zong, F. Xia, W. Li, R. Navigli (Eds.), Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), Association for Computational Linguistics, Online, 2021, pp. 256–262. URL: https://aclanthology.org/2021.acl-short.34. doi:10.18653/v1/2021.acl-short.34.
- P. Kingsbury, M. Palmer, From TreeBank to PropBank, in: M. González Rodríguez, C. P. Suarez Araujo (Eds.), Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02), European Language Resources Association (ELRA), Las Palmas, Canary Islands - Spain, 2002. URL: http://www.lrec-conf.org/proceedings/lrec2002/pdf/283.pdf.
- BERT: Pre-training of deep bidirectional transformers for language understanding, in: J. Burstein, C. Doran, T. Solorio (Eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics, Minneapolis, Minnesota, 2019, pp. 4171–4186. URL: https://aclanthology.org/N19-1423. doi:10.18653/v1/N19-1423.
- One SPRING to rule them both: Symmetric AMR semantic parsing and generation without a complex pipeline, in: Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021, AAAI Press, 2021, pp. 12564–12573. URL: https://ojs.aaai.org/index.php/AAAI/article/view/17489.
- Scalable zero-shot entity linking with dense entity retrieval, in: B. Webber, T. Cohn, Y. He, Y. Liu (Eds.), Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Online, 2020, pp. 6397–6407. URL: https://aclanthology.org/2020.emnlp-main.519. doi:10.18653/v1/2020.emnlp-main.519.
- Graph attention networks, CoRR abs/1710.10903 (2017). URL: http://arxiv.org/abs/1710.10903. arXiv:1710.10903.
- Self-rag: Learning to retrieve, generate, and critique through self-reflection, CoRR abs/2310.11511 (2023). URL: https://doi.org/10.48550/arXiv.2310.11511. doi:10.48550/ARXIV.2310.11511. arXiv:2310.11511.
- Efficient memory management for large language model serving with pagedattention, in: J. Flinn, M. I. Seltzer, P. Druschel, A. Kaufmann, J. Mace (Eds.), Proceedings of the 29th Symposium on Operating Systems Principles, SOSP 2023, Koblenz, Germany, October 23-26, 2023, ACM, 2023, pp. 611–626. URL: https://doi.org/10.1145/3600006.3613165. doi:10.1145/3600006.3613165.
- News summarization and evaluation in the era of GPT-3, CoRR abs/2209.12356 (2022). URL: https://doi.org/10.48550/arXiv.2209.12356. doi:10.48550/arXiv.2209.12356. arXiv:2209.12356.
- Bleu: a method for automatic evaluation of machine translation, in: P. Isabelle, E. Charniak, D. Lin (Eds.), Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, 2002, pp. 311–318. URL: https://aclanthology.org/P02-1040. doi:10.3115/1073083.1073135.
- C.-Y. Lin, ROUGE: A package for automatic evaluation of summaries, in: Text Summarization Branches Out, Association for Computational Linguistics, Barcelona, Spain, 2004, pp. 74–81. URL: https://aclanthology.org/W04-1013.
- Can NLI models verify QA systems’ predictions?, in: M.-F. Moens, X. Huang, L. Specia, S. W.-t. Yih (Eds.), Findings of the Association for Computational Linguistics: EMNLP 2021, Association for Computational Linguistics, Punta Cana, Dominican Republic, 2021, pp. 3841–3854. URL: https://aclanthology.org/2021.findings-emnlp.324. doi:10.18653/v1/2021.findings-emnlp.324.
- Y. Chen, S. Eger, MENLI: robust evaluation metrics from natural language inference, CoRR abs/2208.07316 (2022). URL: https://doi.org/10.48550/arXiv.2208.07316. doi:10.48550/arXiv.2208.07316. arXiv:2208.07316.
- Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing, CoRR abs/2111.09543 (2021). URL: https://arxiv.org/abs/2111.09543. arXiv:2111.09543.
- Knowledge-augmented language model verification, in: H. Bouamor, J. Pino, K. Bali (Eds.), Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Singapore, 2023, pp. 1720–1736. URL: https://aclanthology.org/2023.emnlp-main.107. doi:10.18653/v1/2023.emnlp-main.107.
- Wenyu Huang (7 papers)
- Guancheng Zhou (3 papers)
- Mirella Lapata (135 papers)
- Pavlos Vougiouklis (11 papers)
- Sebastien Montella (6 papers)
- Jeff Z. Pan (78 papers)