Disentangling comprehension from memorization on the Swahili proverbs task

Ascertain whether the observed accuracy of large language models on the BIG-bench swahili_english_proverbs task reflects genuine Swahili language understanding or instead memorization of specific proverbs from Internet sources used during pretraining.

Background

In evaluating low-resource language tasks, the authors report that models achieve 43% accuracy (4-way multiple choice) on the Swahili–English proverb matching task, with performance improving with scale. However, because some proverbs are likely present on the web, it is unclear if models are demonstrating language comprehension or simply retrieving memorized content from training data. Resolving this ambiguity is important for assessing genuine multilingual capability.

References

However, it is not clear whether this performance indicates general understanding of Swahili or instead memorization of proverbs listed on the internet.

— Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models (2206.04615 - Srivastava et al., 2022) in Section “Performance on non-English languages,” subsection “Low-resource language tasks are particularly challenging”

Disentangling comprehension from memorization on the Swahili proverbs task

Background

References

Related Problems