Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Can Language Models Act as Knowledge Bases at Scale? (2402.14273v1)

Published 22 Feb 2024 in cs.CL

Abstract: LLMs have demonstrated remarkable proficiency in understanding and generating responses to complex queries through large-scale pre-training. However, the efficacy of these models in memorizing and reasoning among large-scale structured knowledge, especially world knowledge that explicitly covers abundant factual information remains questionable. Addressing this gap, our research investigates whether LLMs can effectively store, recall, and reason with knowledge on a large scale comparable to latest knowledge bases (KBs) such as Wikidata. Specifically, we focus on three crucial aspects to study the viability: (1) the efficiency of LLMs with different sizes in memorizing the exact knowledge in the large-scale KB; (2) the flexibility of recalling the memorized knowledge in response to natural language queries; (3) the capability to infer new knowledge through reasoning. Our findings indicate that while LLMs hold promise as large-scale KBs capable of retrieving and responding with flexibility, enhancements in their reasoning capabilities are necessary to fully realize their potential.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (65)
  1. Variance reduction in sgd by distributed importance sampling. Statistics Research Repository, arXiv:1511.06481. Version 7.
  2. Prompting as probing: Using language models for knowledge base construction. Computation Research Repository, arXiv:2208.11057. Version 3.
  3. A review on language models as knowledge bases. Computation Research Repository, arXiv:2204.06031.
  4. Dbpedia: a nucleus for a web of open data. In Proceedings of the 6th International The Semantic Web and 2nd Asian Conference on Asian Semantic Web Conference, ISWC’07/ASWC’07, page 722–735, Berlin, Heidelberg. Springer-Verlag.
  5. Knowledge-augmented language model prompting for zero-shot knowledge graph question answering. In Proceedings of the 1st Workshop on Natural Language Reasoning and Structured Explanations (NLRSE), pages 78–106, Toronto, Canada. Association for Computational Linguistics.
  6. Learning to answer complex questions over knowledge bases with query composition. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, CIKM ’19, page 739–748, New York, NY, USA. Association for Computing Machinery.
  7. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD ’08, page 1247–1250, New York, NY, USA. Association for Computing Machinery.
  8. On the opportunities and risks of foundation models. Computation Research Repository, arXiv:2108.07258.
  9. COMET: Commonsense transformers for automatic knowledge graph construction. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4762–4779, Florence, Italy. Association for Computational Linguistics.
  10. Knowledgeable or educated guess? revisiting language models as knowledge bases. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1860–1874, Online. Association for Computational Linguistics.
  11. Quantifying memorization across neural language models. Computation Research Repository, arXiv:2202.07646.
  12. Probing simile knowledge from pre-trained language models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5875–5887, Dublin, Ireland. Association for Computational Linguistics.
  13. A (sub)graph isomorphism algorithm for matching large graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(10):1367–1372.
  14. Commonsense knowledge mining from pretrained models. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 1173–1178, Hong Kong, China. Association for Computational Linguistics.
  15. Measuring and improving consistency in pretrained language models. Transactions of the Association for Computational Linguistics, 9:1012–1031.
  16. Space efficient context encoding for non-task-oriented dialogue generation with graph attention transformer. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 7028–7041, Online. Association for Computational Linguistics.
  17. Martin Grohe and Pascal Schweitzer. 2020. The graph isomorphism problem. Commun. ACM, 63(11):128–134.
  18. Accelerate: Training and inference at scale made simple, efficient and adaptable. https://github.com/huggingface/accelerate.
  19. Retrieval augmented language model pre-training. In Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 3929–3938. PMLR.
  20. Sok: Memorization in general-purpose large language models. Computation Research Repository, arXiv:2310.18362.
  21. Benjamin Heinzerling and Kentaro Inui. 2021. Language models as knowledge bases: On entity representations, storage capacity, and paraphrased queries. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 1772–1791, Online. Association for Computational Linguistics.
  22. How can we know what language models know? Transactions of the Association for Computational Linguistics, 8:423–438.
  23. Magdalena Kaiser and Philipp Christmann. 2021. Wikidata core for question answering. https://github.com/PhilippChr/wikidata-core-for-QA.
  24. Large language models struggle to learn long-tail knowledge. In Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 15696–15707. PMLR.
  25. Copyright violations and large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 7403–7412, Singapore. Association for Computational Linguistics.
  26. Angelos Katharopoulos and Francois Fleuret. 2018. Not all samples are created equal: Deep learning with importance sampling. In Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 2525–2534. PMLR.
  27. Amr Keleg and Walid Magdy. 2023. DLAMA: A framework for curating culturally diverse facts for probing the knowledge of pretrained language models. In Findings of the Association for Computational Linguistics: ACL 2023, pages 6245–6266, Toronto, Canada. Association for Computational Linguistics.
  28. Yunshi Lan and Jing Jiang. 2020. Query graph generation for answering multi-hop complex questions from knowledge bases. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 969–974, Online. Association for Computational Linguistics.
  29. A systematic investigation of commonsense knowledge in large language models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11838–11855, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  30. Eliciting knowledge from large pre-trained models for unsupervised knowledge-grounded conversation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 10551–10564, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  31. Knowledge-grounded dialogue generation with a unified knowledge representation. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 206–218, Seattle, United States. Association for Computational Linguistics.
  32. Entity-based knowledge conflicts in question answering. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7052–7063, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  33. Ilya Loshchilov and Frank Hutter. 2019. Decoupled weight decay regularization. Computation Research Repository, arXiv:1711.05101. Version 3.
  34. When not to trust language models: Investigating effectiveness of parametric and non-parametric memories. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 9802–9822, Toronto, Canada. Association for Computational Linguistics.
  35. SKILL: Structured knowledge infusion for large language models. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1581–1588, Seattle, United States. Association for Computational Linguistics.
  36. Seth Neel and Peter Chang. 2024. Privacy issues in large language models: A survey. Computation Research Repository, arXiv:2312.06717.
  37. From freebase to wikidata: The great migration. In Proceedings of the 25th International Conference on World Wide Web, WWW ’16, page 1419–1428, Republic and Canton of Geneva, CHE. International World Wide Web Conferences Steering Committee.
  38. Language models as knowledge bases? In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2463–2473, Hong Kong, China. Association for Computational Linguistics.
  39. Stepwise reasoning for multi-relation question answering over knowledge graph with weak supervision. In Proceedings of the 13th International Conference on Web Search and Data Mining, WSDM ’20, page 474–482, New York, NY, USA. Association for Computing Machinery.
  40. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67.
  41. Zero: Memory optimizations toward training trillion parameter models. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1–16.
  42. Deepspeed: System optimizations enable training deep learning models with over 100 billion parameters. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’20, page 3505–3506, New York, NY, USA. Association for Computing Machinery.
  43. Impact of pretraining term frequencies on few-shot numerical reasoning. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 840–854, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  44. How much knowledge can you pack into the parameters of a language model? In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5418–5426, Online. Association for Computational Linguistics.
  45. Atomic: An atlas of machine commonsense for if-then reasoning. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01):3027–3035.
  46. Improving multi-hop question answering over knowledge graphs using knowledge base embeddings. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4498–4507, Online. Association for Computational Linguistics.
  47. Noam Shazeer and Mitchell Stern. 2018. Adafactor: Adaptive learning rates with sublinear memory cost. In Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 4596–4604. PMLR.
  48. Baoxu Shi and Tim Weninger. 2018. Open-world knowledge graph completion. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1).
  49. AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4222–4235, Online. Association for Computational Linguistics.
  50. Conceptnet 5.5: An open multilingual graph of general knowledge. Proceedings of the AAAI Conference on Artificial Intelligence, 31(1).
  51. Beyond memorization: Violating privacy via inference with large language models. Computation Research Repository, arXiv:2310.07298.
  52. Can language models be biomedical knowledge bases? In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 4723–4734, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  53. Memorisation versus generalisation in pre-trained language models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7564–7578, Dublin, Ireland. Association for Computational Linguistics.
  54. Llama 2: Open foundation and fine-tuned chat models. Computation Research Repository, arXiv:2307.09288.
  55. Facts as experts: Adaptable and interpretable neural memory over symbolic knowledge. Computation Research Repository, arXiv:2007.00849.
  56. Evaluating the knowledge base completion potential of GPT. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 6432–6443, Singapore. Association for Computational Linguistics.
  57. Evaluating language models for knowledge base completion. In The Semantic Web: 20th International Conference, ESWC 2023, Hersonissos, Crete, Greece, May 28–June 1, 2023, Proceedings, page 227–243, Berlin, Heidelberg. Springer-Verlag.
  58. KEPLER: A unified model for knowledge embedding and pre-trained language representation. Transactions of the Association for Computational Linguistics, 9:176–194.
  59. Ontology-based subgraph querying. In 2013 IEEE 29th International Conference on Data Engineering (ICDE), pages 697–708.
  60. Data selection for language models via importance resampling. Computation Research Repository, arXiv:2302.03169.
  61. Rethinking knowledge graph evaluation under the open-world assumption. In Advances in Neural Information Processing Systems, volume 35, pages 8374–8385. Curran Associates, Inc.
  62. Deep bidirectional language-knowledge graph pretraining. In Advances in Neural Information Processing Systems, volume 35, pages 37309–37323. Curran Associates, Inc.
  63. Autoassist: A framework to accelerate training of deep neural networks. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc.
  64. A comprehensive study of knowledge editing for large language models. Computation Research Repository, arXiv:2401.01286.
  65. Greaselm: Graph reasoning enhanced language models. In International conference on learning representations.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Qiyuan He (10 papers)
  2. Yizhong Wang (42 papers)
  3. Wenya Wang (40 papers)
Citations (7)