Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Does He Wink or Does He Nod? A Challenging Benchmark for Evaluating Word Understanding of Language Models (2102.03596v1)

Published 6 Feb 2021 in cs.CL

Abstract: Recent progress in pretraining LLMs on large corpora has resulted in large performance gains on many NLP tasks. These large models acquire linguistic knowledge during pretraining, which helps to improve performance on downstream tasks via fine-tuning. To assess what kind of knowledge is acquired, LLMs are commonly probed by querying them with `fill in the blank' style cloze questions. Existing probing datasets mainly focus on knowledge about relations between words and entities. We introduce WDLMPro (Word Definition LLM Probing) to evaluate word understanding directly using dictionary definitions of words. In our experiments, three popular pretrained LLMs struggle to match words and their definitions. This indicates that they understand many words poorly and that our new probing task is a difficult challenge that could help guide research on LMs in the future.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Hinrich Schütze (250 papers)
  2. Lutfi Kerem Senel (3 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.