NoCoLA: The Norwegian Corpus of Linguistic Acceptability (2306.07790v1)

Published 13 Jun 2023 in cs.CL and cs.AI

Abstract: While there has been a surge of LLMs for Norwegian in recent years, we lack any tool to evaluate their understanding of grammaticality. We present two new Norwegian datasets for this task. NoCoLA_class is a supervised binary classification task where the goal is to discriminate between acceptable and non-acceptable sentences. On the other hand, NoCoLA_zero is a purely diagnostic task for evaluating the grammatical judgement of a LLM in a completely zero-shot manner, i.e. without any further training. In this paper, we describe both datasets in detail, show how to use them for different flavors of LLMs, and conduct a comparative study of the existing Norwegian LLMs.

References (22)

Citations (9)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

NoCoLA: The Norwegian Corpus of Linguistic Acceptability (2306.07790v1)

Summary

Related Papers