Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

User-Generated Text Corpus for Evaluating Japanese Morphological Analysis and Lexical Normalization (2104.03523v1)

Published 8 Apr 2021 in cs.CL

Abstract: Morphological analysis (MA) and lexical normalization (LN) are both important tasks for Japanese user-generated text (UGT). To evaluate and compare different MA/LN systems, we have constructed a publicly available Japanese UGT corpus. Our corpus comprises 929 sentences annotated with morphological and normalization information, along with category information we classified for frequent UGT-specific phenomena. Experiments on the corpus demonstrated the low performance of existing MA/LN methods for non-general words and non-standard forms, indicating that the corpus would be a challenging benchmark for further research on UGT.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Shohei Higashiyama (5 papers)
  2. Masao Utiyama (39 papers)
  3. Taro Watanabe (76 papers)
  4. Eiichiro Sumita (31 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.