Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

BEAR: A Unified Framework for Evaluating Relational Knowledge in Causal and Masked Language Models (2404.04113v1)

Published 5 Apr 2024 in cs.CL

Abstract: Knowledge probing assesses to which degree a LLM (LM) has successfully learned relational knowledge during pre-training. Probing is an inexpensive way to compare LMs of different sizes and training configurations. However, previous approaches rely on the objective function used in pre-training LMs and are thus applicable only to masked or causal LMs. As a result, comparing different types of LMs becomes impossible. To address this, we propose an approach that uses an LM's inherent ability to estimate the log-likelihood of any given textual statement. We carefully design an evaluation dataset of 7,731 instances (40,916 in a larger variant) from which we produce alternative statements for each relational fact, one of which is correct. We then evaluate whether an LM correctly assigns the highest log-likelihood to the correct statement. Our experimental evaluation of 22 common LMs shows that our proposed framework, BEAR, can effectively probe for knowledge across different LM types. We release the BEAR datasets and an open-source framework that implements the probing approach to the research community to facilitate the evaluation and development of LMs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Max Bachmann. 2023. RapidFuzz. Original-date: 2020-02-29T14:41:44Z.
  2. The Life Cycle of Knowledge in Big Language Models: A Survey. ArXiv:2303.07616 [cs].
  3. Knowledgeable or Educated Guess? Revisiting Language Models as Knowledge Bases. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1860–1874, Online. Association for Computational Linguistics.
  4. Unsupervised Cross-lingual Representation Learning at Scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8440–8451, Online. Association for Computational Linguistics.
  5. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
  6. Time-Aware Language Models as Temporal Knowledge Bases. Transactions of the Association for Computational Linguistics, 10:257–273.
  7. Measuring and Improving Consistency in Pretrained Language Models. Transactions of the Association for Computational Linguistics, 9:1012–1031.
  8. T-REx: A Large Scale Alignment of Natural Language with Knowledge Base Triples. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. European Language Resources Association (ELRA).
  9. Gemini-Team. 2023. Gemini: A Family of Highly Capable Multimodal Models.
  10. Mask-Predict: Parallel Decoding of Conditional Masked Language Models. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 6112–6121, Hong Kong, China. Association for Computational Linguistics.
  11. Mistral 7B.
  12. X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained Language Models. arXiv:2010.06189 [cs]. ArXiv: 2010.06189.
  13. How Can We Know What Language Models Know? Transactions of the Association for Computational Linguistics, 8:423–438.
  14. Simple and Effective Multi-Token Completion from Masked Language Models. In Findings of the Association for Computational Linguistics: EACL 2023, pages 2356–2369, Dubrovnik, Croatia. Association for Computational Linguistics.
  15. Jan-Christoph Kalo and Leandra Fichtel. 2022. KAMEL : Knowledge Analysis with Multitoken Entities in Language Models. In Automated Knowledge Base Construction.
  16. Carina Kauf and Anna Ivanova. 2023. A Better Way to Do Masked Language Model Scoring. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 925–935, Toronto, Canada. Association for Computational Linguistics.
  17. RoBERTa: A Robustly Optimized BERT Pretraining Approach. ArXiv:1907.11692 [cs].
  18. BioGPT: generative pre-trained transformer for biomedical text generation and mining. Briefings in Bioinformatics, 23(6):bbac409.
  19. When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 9802–9822, Toronto, Canada. Association for Computational Linguistics.
  20. Kanishka Misra. 2022. minicons: Enabling flexible behavioral and representational analyses of transformer language models. arXiv preprint arXiv:2203.13112.
  21. Entity Cloze By Date: What LMs Know About Unseen Entities. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 693–702, Seattle, United States. Association for Computational Linguistics.
  22. OpenAI. 2023. GPT-4 Technical Report. ArXiv:2303.08774 [cs].
  23. Language Models as Knowledge Bases? In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2463–2473, Hong Kong, China. Association for Computational Linguistics.
  24. E-BERT: Efficient-Yet-Effective Entity Embeddings for BERT. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 803–818, Online. Association for Computational Linguistics.
  25. Language Models are Unsupervised Multitask Learners.
  26. SQuAD: 100,000+ Questions for Machine Comprehension of Text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383–2392, Austin, Texas. Association for Computational Linguistics.
  27. Leveraging Large Language Models for Multiple Choice Question Answering.
  28. Masked Language Model Scoring. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 2699–2712, Online. Association for Computational Linguistics.
  29. Blank Language Models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5186–5198, Online. Association for Computational Linguistics.
  30. Robyn Speer and Catherine Havasi. 2012. Representing General Relational Knowledge in ConceptNet 5. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12), pages 3679–3686, Istanbul, Turkey. European Language Resources Association (ELRA).
  31. Llama 2: Open Foundation and Fine-Tuned Chat Models.
  32. Wikidata contributors. 2022. Dump of Wikidata of January 3rd 2022.
  33. Give Me the Facts! A Survey on Factual Knowledge Probing in Pre-trained Language Models. ArXiv:2310.16570 [cs].
  34. OPT: Open Pre-trained Transformer Language Models. ArXiv:2205.01068 [cs].
  35. Factual Probing Is [MASK]: Learning vs. Learning to Recall. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5017–5033, Online. Association for Computational Linguistics.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com