Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Neural Chinese Word Segmentation with Dictionary Knowledge (1807.05849v1)

Published 11 Jul 2018 in cs.CL, cs.LG, and stat.ML

Abstract: Chinese word segmentation (CWS) is an important task for Chinese NLP. Recently, many neural network based methods have been proposed for CWS. However, these methods require a large number of labeled sentences for model training, and usually cannot utilize the useful information in Chinese dictionary. In this paper, we propose two methods to exploit the dictionary information for CWS. The first one is based on pseudo labeled data generation, and the second one is based on multi-task learning. The experimental results on two benchmark datasets validate that our approach can effectively improve the performance of Chinese word segmentation, especially when training data is insufficient.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Junxin Liu (3 papers)
  2. Fangzhao Wu (81 papers)
  3. Chuhan Wu (87 papers)
  4. Yongfeng Huang (110 papers)
  5. Xing Xie (220 papers)
Citations (53)

Summary

We haven't generated a summary for this paper yet.