Refining Targeted Syntactic Evaluation of Language Models (2104.09635v1)

Published 19 Apr 2021 in cs.CL

Abstract: Targeted syntactic evaluation of subject-verb number agreement in English (TSE) evaluates LLMs' syntactic knowledge using hand-crafted minimal pairs of sentences that differ only in the main verb's conjugation. The method evaluates whether LLMs rate each grammatical sentence as more likely than its ungrammatical counterpart. We identify two distinct goals for TSE. First, evaluating the systematicity of a LLM's syntactic knowledge: given a sentence, can it conjugate arbitrary verbs correctly? Second, evaluating a model's likely behavior: given a sentence, does the model concentrate its probability mass on correctly conjugated verbs, even if only on a subset of the possible verbs? We argue that current implementations of TSE do not directly capture either of these goals, and propose new metrics to capture each goal separately. Under our metrics, we find that TSE overestimates systematicity of LLMs, but that models score up to 40% better on verbs that they predict are likely in context.

Citations (37)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Refining Targeted Syntactic Evaluation of Language Models (2104.09635v1)

Summary

Related Papers