Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Long Short-Term Memory for Japanese Word Segmentation (1709.08011v3)

Published 23 Sep 2017 in cs.CL

Abstract: This study presents a Long Short-Term Memory (LSTM) neural network approach to Japanese word segmentation (JWS). Previous studies on Chinese word segmentation (CWS) succeeded in using recurrent neural networks such as LSTM and gated recurrent units (GRU). However, in contrast to Chinese, Japanese includes several character types, such as hiragana, katakana, and kanji, that produce orthographic variations and increase the difficulty of word segmentation. Additionally, it is important for JWS tasks to consider a global context, and yet traditional JWS approaches rely on local features. In order to address this problem, this study proposes employing an LSTM-based approach to JWS. The experimental results indicate that the proposed model achieves state-of-the-art accuracy with respect to various Japanese corpora.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Yoshiaki Kitagawa (1 paper)
  2. Mamoru Komachi (40 papers)
Citations (25)

Summary

We haven't generated a summary for this paper yet.