Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 54 tok/s
Gemini 2.5 Pro 54 tok/s Pro
GPT-5 Medium 22 tok/s Pro
GPT-5 High 25 tok/s Pro
GPT-4o 99 tok/s Pro
Kimi K2 196 tok/s Pro
GPT OSS 120B 333 tok/s Pro
Claude Sonnet 4.5 34 tok/s Pro
2000 character limit reached

Polishing Every Facet of the GEM: Testing Linguistic Competence of LLMs and Humans in Korean (2506.01237v1)

Published 2 Jun 2025 in cs.CL and cs.AI

Abstract: We introduce the $\underline{Ko}rean \underline{G}rammar \underline{E}valuation Bench\underline{M}ark (KoGEM)$, designed to assess the linguistic competence of LLMs and humans in Korean. KoGEM consists of 1.5k multiple-choice QA pairs covering five main categories and 16 subcategories. The zero-shot evaluation of 27 LLMs of various sizes and types reveals that while LLMs perform remarkably well on straightforward tasks requiring primarily definitional knowledge, they struggle with tasks that demand the integration of real-world experiential knowledge, such as phonological rules and pronunciation. Furthermore, our in-depth analysis suggests that incorporating such experiential knowledge could enhance the linguistic competence of LLMs. With KoGEM, we not only highlight the limitations of current LLMs in linguistic competence but also uncover hidden facets of LLMs in linguistic competence, paving the way for enhancing comprehensive language understanding. Our code and dataset are available at: https://github.com/SungHo3268/KoGEM.

Summary

  • The paper introduces KoGEM, a benchmark comprising 1,524 multiple-choice questions that evaluate five main Korean grammar categories.
  • The paper finds that LLMs excel in morphology and semantics but underperform in phonology and pragmatic tasks compared to humans.
  • The paper demonstrates that integrating experiential knowledge into LLMs significantly improves performance, suggesting new avenues for multilingual benchmarks.

Evaluation of Linguistic Competence in LLMs and Humans using KoGEM

This paper introduces the Korean Grammar Evaluation Benchmark (KoGEM), a pivotal tool designed to evaluate the linguistic competence of LLMs and human participants specifically in the Korean language. KoGEM comprises 1,524 multiple-choice questions pertaining to diverse aspects of Korean grammar, categorized into five main classifications: Phonology, Morphology, Syntax, Semantics, and Norms, further refined into 16 subcategories.

The motivation behind developing KoGEM is rooted in the necessity to assess whether LLMs can genuinely comprehend linguistic principles beyond their impressive feats of pattern recognition across extensive datasets. While LLMs have demonstrated substantial capabilities in language processing tasks, their proficiency in nuanced linguistic understanding remains subject to scrutiny, particularly when evaluated against human linguistic intuition and empirical knowledge derived from real-world engagements.

Key Findings

  1. Performance Across Categories: The evaluation reveals that LLMs exhibit markedly disparate performance levels across different grammatical categories. While they excel in tasks related to Morphology and aspects of Semantics, where pattern recognition and definitional knowledge are prevalent, they tend to perform sub-optimally in Phonology, where experiential understanding and multimodal reasoning are crucial.
  2. Human vs. LLM Performance: Human participants consistently outperform LLMs in categories that require intuitive reasoning, such as Phonological Alternations and Pragmatics. The paper suggests this discrepancy is due to the human ability to integrate spoken language experiences and nuanced phonological rules naturally, capabilities that current LLMs lack.
  3. Incorporation of Experiential Knowledge: Experimental augmentation of LLMs with experiential knowledge analogous to human cognitive processes, such as pronunciation information and morphological decomposition, showed significant performance improvements. This underscores an avenue for enhancing LLM linguistic competence by embedding real-world experiential dynamics into their training paradigms.

Implications and Future Directions

The KoGEM benchmark serves as a detailed framework for understanding the linguistic prowess of LLMs and identifying areas wherein they may benefit from further refinement and real-world grounding. This research can inform future developments in AI, particularly in enhancing LLM architecture to mimic human-like language processing capabilities. By expanding similar benchmarks to other languages, researchers could bolster the multilingual capacities of LLMs, reflecting broader linguistic nuances.

Moreover, these findings highlight the potential value of integrating multimodal learning and experiential knowledge embeddings into LLM training regimens. This aligns with the theoretical proposition that linguistic competence should not solely depend on extensive text-only datasets but should encompass richer, context-aware cognitive paradigms akin to human language acquisition.

In conclusion, KoGEM's comprehensive, category-specific analysis of Korean grammar provides an essential glimpse into the limitations and prospective areas for enhancement within LLMs, paving the way for more nuanced, linguistically adept AI systems.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.