Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 54 tok/s

Gemini 2.5 Pro 54 tok/s Pro

GPT-5 Medium 22 tok/s Pro

GPT-5 High 25 tok/s Pro

GPT-4o 99 tok/s Pro

Kimi K2 196 tok/s Pro

GPT OSS 120B 333 tok/s Pro

Claude Sonnet 4.5 34 tok/s Pro

2000 character limit reached

Polishing Every Facet of the GEM: Testing Linguistic Competence of LLMs and Humans in Korean (2506.01237v1)

Published 2 Jun 2025 in cs.CL and cs.AI

Abstract: We introduce the $\underline{Ko}rean \underline{G}rammar \underline{E}valuation Bench\underline{M}ark (KoGEM)$, designed to assess the linguistic competence of LLMs and humans in Korean. KoGEM consists of 1.5k multiple-choice QA pairs covering five main categories and 16 subcategories. The zero-shot evaluation of 27 LLMs of various sizes and types reveals that while LLMs perform remarkably well on straightforward tasks requiring primarily definitional knowledge, they struggle with tasks that demand the integration of real-world experiential knowledge, such as phonological rules and pronunciation. Furthermore, our in-depth analysis suggests that incorporating such experiential knowledge could enhance the linguistic competence of LLMs. With KoGEM, we not only highlight the limitations of current LLMs in linguistic competence but also uncover hidden facets of LLMs in linguistic competence, paving the way for enhancing comprehensive language understanding. Our code and dataset are available at: https://github.com/SungHo3268/KoGEM.

Summary

The paper introduces KoGEM, a benchmark comprising 1,524 multiple-choice questions that evaluate five main Korean grammar categories.
The paper finds that LLMs excel in morphology and semantics but underperform in phonology and pragmatic tasks compared to humans.
The paper demonstrates that integrating experiential knowledge into LLMs significantly improves performance, suggesting new avenues for multilingual benchmarks.

Evaluation of Linguistic Competence in LLMs and Humans using KoGEM

This paper introduces the Korean Grammar Evaluation Benchmark (KoGEM), a pivotal tool designed to evaluate the linguistic competence of LLMs and human participants specifically in the Korean language. KoGEM comprises 1,524 multiple-choice questions pertaining to diverse aspects of Korean grammar, categorized into five main classifications: Phonology, Morphology, Syntax, Semantics, and Norms, further refined into 16 subcategories.

The motivation behind developing KoGEM is rooted in the necessity to assess whether LLMs can genuinely comprehend linguistic principles beyond their impressive feats of pattern recognition across extensive datasets. While LLMs have demonstrated substantial capabilities in language processing tasks, their proficiency in nuanced linguistic understanding remains subject to scrutiny, particularly when evaluated against human linguistic intuition and empirical knowledge derived from real-world engagements.

Key Findings

Performance Across Categories: The evaluation reveals that LLMs exhibit markedly disparate performance levels across different grammatical categories. While they excel in tasks related to Morphology and aspects of Semantics, where pattern recognition and definitional knowledge are prevalent, they tend to perform sub-optimally in Phonology, where experiential understanding and multimodal reasoning are crucial.
Human vs. LLM Performance: Human participants consistently outperform LLMs in categories that require intuitive reasoning, such as Phonological Alternations and Pragmatics. The paper suggests this discrepancy is due to the human ability to integrate spoken language experiences and nuanced phonological rules naturally, capabilities that current LLMs lack.
Incorporation of Experiential Knowledge: Experimental augmentation of LLMs with experiential knowledge analogous to human cognitive processes, such as pronunciation information and morphological decomposition, showed significant performance improvements. This underscores an avenue for enhancing LLM linguistic competence by embedding real-world experiential dynamics into their training paradigms.

Implications and Future Directions

The KoGEM benchmark serves as a detailed framework for understanding the linguistic prowess of LLMs and identifying areas wherein they may benefit from further refinement and real-world grounding. This research can inform future developments in AI, particularly in enhancing LLM architecture to mimic human-like language processing capabilities. By expanding similar benchmarks to other languages, researchers could bolster the multilingual capacities of LLMs, reflecting broader linguistic nuances.

Moreover, these findings highlight the potential value of integrating multimodal learning and experiential knowledge embeddings into LLM training regimens. This aligns with the theoretical proposition that linguistic competence should not solely depend on extensive text-only datasets but should encompass richer, context-aware cognitive paradigms akin to human language acquisition.

In conclusion, KoGEM's comprehensive, category-specific analysis of Korean grammar provides an essential glimpse into the limitations and prospective areas for enhancement within LLMs, paving the way for more nuanced, linguistically adept AI systems.