Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Rich Character-Level Information for Korean Morphological Analysis and Part-of-Speech Tagging (1806.10771v1)

Published 28 Jun 2018 in cs.CL

Abstract: Due to the fact that Korean is a highly agglutinative, character-rich language, previous work on Korean morphological analysis typically employs the use of sub-character features known as graphemes or otherwise utilizes comprehensive prior linguistic knowledge (i.e., a dictionary of known morphological transformation forms, or actions). These models have been created with the assumption that character-level, dictionary-less morphological analysis was intractable due to the number of actions required. We present, in this study, a multi-stage action-based model that can perform morphological transformation and part-of-speech tagging using arbitrary units of input and apply it to the case of character-level Korean morphological analysis. Among models that do not employ prior linguistic knowledge, we achieve state-of-the-art word and sentence-level tagging accuracy with the Sejong Korean corpus using our proposed data-driven Bi-LSTM model.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Andrew Matteson (1 paper)
  2. Chanhee Lee (14 papers)
  3. Young-Bum Kim (22 papers)
  4. Heuiseok Lim (49 papers)
Citations (16)

Summary

We haven't generated a summary for this paper yet.