Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

J-UniMorph: Japanese Morphological Annotation through the Universal Feature Schema (2402.14411v1)

Published 22 Feb 2024 in cs.CL

Abstract: We introduce a Japanese Morphology dataset, J-UniMorph, developed based on the UniMorph feature schema. This dataset addresses the unique and rich verb forms characteristic of the language's agglutinative nature. J-UniMorph distinguishes itself from the existing Japanese subset of UniMorph, which is automatically extracted from Wiktionary. On average, the Wiktionary Edition features around 12 inflected forms for each word and is primarily dominated by denominal verbs (i.e., [noun] +suru (do-PRS)). Morphologically, this form is equivalent to the verb suru (do). In contrast, J-UniMorph explores a much broader and more frequently used range of verb forms, offering 118 inflected forms for each word on average. It includes honorifics, a range of politeness levels, and other linguistic nuances, emphasizing the distinctive characteristics of the Japanese language. This paper presents detailed statistics and characteristics of J-UniMorph, comparing it with the Wiktionary Edition. We release J-UniMorph and its interactive visualizer publicly available, aiming to support cross-linguistic research and various applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (21)
  1. UniMorph 4.0: Universal Morphology. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 840–855, Marseille, France. European Language Resources Association.
  2. The CoNLL–SIGMORPHON 2018 shared task: Universal morphological reinflection. In Proceedings of the CoNLL–SIGMORPHON 2018 Shared Task: Universal Morphological Reinflection, pages 1–27, Brussels. Association for Computational Linguistics.
  3. CoNLL-SIGMORPHON 2017 shared task: Universal morphological reinflection in 52 languages. In Proceedings of the CoNLL SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection, pages 1–30, Vancouver. Association for Computational Linguistics.
  4. The SIGMORPHON 2016 shared Task—Morphological reinflection. In Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 10–22, Berlin, Germany. Association for Computational Linguistics.
  5. SIGMORPHON–UniMorph 2023 shared task 0: Typologically diverse morphological inflection. In Proceedings of the 20th SIGMORPHON workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 117–125, Toronto, Canada. Association for Computational Linguistics.
  6. Omer Goldman and Reut Tsarfaty. 2022. Morphology Without Borders: Clause-Level Morphology. Transactions of the Association for Computational Linguistics, 10:1455–1472.
  7. Morphological reinflection with multiple arguments: An extended annotation schema and a Georgian case study. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 196–202, Dublin, Ireland. Association for Computational Linguistics.
  8. Yoshisuke Hirabayashi and Yumiko Hama. 1988. Keigo (Honorific Speech). Aratake Publishers.
  9. Taeko Kamiya. 2001. The handbook of Japanese verbs. Kodansha.
  10. Yasuhiko Kato and Tsutomu Fukuchi. 1989. Tense, Aspect, and Mood. Aratake Publishers.
  11. SIGMORPHON–UniMorph 2022 shared task 0: Generalization and typologically diverse morphological inflection. In Proceedings of the 19th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 176–203, Seattle, Washington. Association for Computational Linguistics.
  12. UniMorph 3.0: Universal Morphology. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 3922–3931, Marseille, France. European Language Resources Association.
  13. The SIGMORPHON 2019 shared task: Morphological analysis in context and cross-lingual transfer for inflection. In Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 229–244, Florence, Italy. Association for Computational Linguistics.
  14. Yoshiyuki Morita and Masae Matsuki. 1989. Nihongo Hyogen Bunkei (Structures of Japanese Expressions). ALC PRESS.
  15. Nihongo Kijutsu Bunpo Kenkyukai. 2007. Gendai Nihongo Bunpo 3 (Contemporary Japanese Grammar 3). Kurosio Publishers. (In Japanese).
  16. Nihongo Kijutsu Bunpo Kenkyukai. 2009a. Gendai Nihongo Bunpo 2 (Contemporary Japanese Grammar 2). Kurosio Publishers. (In Japanese).
  17. Nihongo Kijutsu Bunpo Kenkyukai. 2009b. Gendai Nihongo Bunpo 7 (Contemporary Japanese Grammar 7). Kurosio Publishers. (In Japanese).
  18. SIGMORPHON 2021 shared task on morphological reinflection: Generalization across languages. In Proceedings of the 18th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 229–259, Online. Association for Computational Linguistics.
  19. John Sylak-Glassman. 2016. The composition and use of the universal morphological feature schema (unimorph schema). Johns Hopkins University.
  20. Ken-ichi Takami. 2011. Ukemi to Shieki (Passive and Causative). Kaitakusha.
  21. SIGMORPHON 2020 shared task 0: Typologically diverse morphological inflection. In Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 1–39, Online. Association for Computational Linguistics.

Summary

We haven't generated a summary for this paper yet.