Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Data Distribution Bottlenecks in Grounding Language Models to Knowledge Bases (2309.08345v3)

Published 15 Sep 2023 in cs.CL and cs.AI

Abstract: LLMs (LMs) have already demonstrated remarkable abilities in understanding and generating both natural and formal language. Despite these advances, their integration with real-world environments such as large-scale knowledge bases (KBs) remains an underdeveloped area, affecting applications such as semantic parsing and indulging in "hallucinated" information. This paper is an experimental investigation aimed at uncovering the robustness challenges that LMs encounter when tasked with knowledge base question answering (KBQA). The investigation covers scenarios with inconsistent data distribution between training and inference, such as generalization to unseen domains, adaptation to various language variations, and transferability across different datasets. Our comprehensive experiments reveal that even when employed with our proposed data augmentation techniques, advanced small and LLMs exhibit poor performance in various dimensions. While the LM is a promising technology, the robustness of the current form in dealing with complex environments is fragile and of limited practicality because of the data distribution issue. This calls for future research on data collection and LM learning paradims.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. Semantic parsing on Freebase from question-answer pairs. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1533–1544, Seattle, Washington, USA. Association for Computational Linguistics.
  2. Knowledge-enriched, type-constrained and grammar-guided question generation over knowledge bases. In Proceedings of the 28th International Conference on Computational Linguistics, COLING 2020, Barcelona, Spain (Online), December 8-13, 2020, pages 2776–2786. International Committee on Computational Linguistics.
  3. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, Vancouver, BC, Canada, June 10-12, 2008, pages 1247–1250. ACM.
  4. Large-scale simple question answering with memory networks. arXiv preprint arXiv:1506.02075.
  5. KQA Pro: A dataset with explicit compositional programs for complex question answering over knowledge base. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, pages 6101–6119. Association for Computational Linguistics.
  6. Program transfer for answering complex questions over knowledge bases. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, pages 8128–8140. Association for Computational Linguistics.
  7. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374.
  8. ReTraCk: A flexible and efficient framework for knowledge base question answering. In Proceedings of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL 2021 - System Demonstrations, Online, August 1-6, 2021, pages 325–336. Association for Computational Linguistics.
  9. Case-based reasoning for natural language queries over knowledge bases. In Proceedings of the EMNLP 2021, pages 9594–9611. Association for Computational Linguistics.
  10. BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pages 4171–4186. Association for Computational Linguistics.
  11. Learning to paraphrase for question answering. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017, pages 875–886. Association for Computational Linguistics.
  12. LC-QuAD 2.0: A large dataset for complex question answering over wikidata and dbpedia. In The Semantic Web - ISWC 2019 - 18th International Semantic Web Conference, Auckland, New Zealand, October 26-30, 2019, Proceedings, Part II, volume 11779 of Lecture Notes in Computer Science, pages 69–78. Springer.
  13. Don’t generate, discriminate: A proposal for grounding language models to real-world environments. arXiv preprint arXiv:2212.09736.
  14. Beyond I.I.D.: three levels of generalization for question answering on knowledge bases. In Proceedings of the Web Conference 2021, pages 3477–3488.
  15. Knowledge base question answering: A semantic parsing perspective. arXiv preprint arXiv:2209.04994.
  16. Yu Gu and Yu Su. 2022. ArcaneQA: Dynamic program induction and contextualized encoding for knowledge base question answering. In Proceedings of the 29th International Conference on Computational Linguistics, pages 1718–1731, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
  17. DSM: Question generation over knowledge base via modeling diverse subgraphs with meta-learner.
  18. The many faces of robustness: A critical analysis of out-of-distribution generalization. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 8320–8329.
  19. How question generation can help question answering over knowledge base. In CCF International Conference on Natural Language Processing and Chinese Computing, pages 80–92. Springer.
  20. EDG-based question decomposition for complex question answering over knowledge bases. In Proceedings of the ISWC 2021, volume 12922 of Lecture Notes in Computer Science, pages 128–145. Springer.
  21. Logical form generation via multi-task learning for complex question answering over knowledge bases. In Proceedings of the 29th International Conference on Computational Linguistics, pages 1687–1696, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
  22. Question decomposition tree for answering complex questions over knowledge bases. In Thirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence, IAAI 2023, Thirteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2023, Washington, DC, USA, February 7-14, 2023, pages 12924–12932. AAAI Press.
  23. State-of-the-art generalisation research in NLP: a taxonomy and review. CoRR, abs/2210.03050.
  24. Measuring compositional generalization: A comprehensive method on realistic data. In ICLR 2020. OpenReview.net.
  25. Complex knowledge base question answering: A survey. IEEE Transactions on Knowledge and Data Engineering, pages 1–20.
  26. Few-shot in-context learning for knowledge base question answering. arXiv preprint arXiv:2305.01750.
  27. Chin-Yew Lin and Franz Josef Och. 2004. Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04), pages 605–612.
  28. Agentbench: Evaluating llms as agents. arXiv preprint 2308.03688.
  29. OpenAI. 2023. GPT-4 technical report. arXiv preprint arXiv:2303.08774.
  30. Training language models to follow instructions with human feedback. arXiv preprint arXiv:2203.02155.
  31. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318.
  32. Pytorch: An imperative style, high-performance deep learning library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32, pages 8024–8035. Curran Associates, Inc.
  33. Revisiting the compositional generalization abilities of neural sequence models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, pages 424–434. Association for Computational Linguistics.
  34. QALD-9-plus: A multilingual dataset for question answering over dbpedia and wikidata translated by native speakers. In 16th IEEE International Conference on Semantic Computing, ICSC 2022, Laguna Hills, CA, USA, January 26-28, 2022, pages 229–234. IEEE.
  35. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21:140:1–140:67.
  36. Universal semantic parsing. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 89–101, Copenhagen, Denmark. Association for Computational Linguistics.
  37. The probabilistic relevance framework: BM25 and beyond. Foundations and Trends® in Information Retrieval, 3(4):333–389.
  38. Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761.
  39. TIARA: Multi-grained retrieval for robust question answering over large knowledge base. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 8108–8121, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  40. Yu Su. 2023. Language agents: a critical evolutionary step of artificial intelligence. yusu.substack.com.
  41. On generating characteristic-rich question sets for QA evaluation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1-4, 2016, pages 562–572. The Association for Computational Linguistics.
  42. SPARQA: skeleton-based semantic parsing for complex questions over knowledge bases. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 8952–8959.
  43. Alon Talmor and Jonathan Berant. 2018. The web as a knowledge-base for answering complex questions. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 641–651.
  44. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint 2307.09288.
  45. LC-QuAD: A corpus for complex question answering over knowledge graphs. In The Semantic Web - ISWC 2017 - 16th International Semantic Web Conference, Vienna, Austria, October 21-25, 2017, Proceedings, Part II, volume 10588 of Lecture Notes in Computer Science, pages 210–218. Springer.
  46. Denny Vrandecic and Markus Krötzsch. 2014. Wikidata: a free collaborative knowledgebase. Commun. ACM, 57(10):78–85.
  47. Chain-of-thought prompting elicits reasoning in large language models. In NeurIPS.
  48. Learning representation mapping for relation detection in knowledge base question answering. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, pages 6130–6139. Association for Computational Linguistics.
  49. Small models are valuable plug-ins for large language models. arXiv preprint arXiv:2305.08848.
  50. RNG-KBQA: Generation augmented iterative ranking for knowledge base question answering. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6032–6043, Dublin, Ireland. Association for Computational Linguistics.
  51. The value of semantic parse labeling for knowledge base question answering. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 2: Short Papers. The Association for Computer Linguistics.
  52. DecAF: Joint decoding of answers and logical forms for question answering over knowledge bases. arXiv preprint arXiv:2210.00063.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Yiheng Shu (9 papers)
  2. Zhiwei Yu (10 papers)
Citations (3)
Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets