Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Neural Chinese Word Segmentation as Sequence to Sequence Translation (1911.12982v1)

Published 29 Nov 2019 in cs.CL

Abstract: Recently, Chinese word segmentation (CWS) methods using neural networks have made impressive progress. Most of them regard the CWS as a sequence labeling problem which construct models based on local features rather than considering global information of input sequence. In this paper, we cast the CWS as a sequence translation problem and propose a novel sequence-to-sequence CWS model with an attention-based encoder-decoder framework. The model captures the global information from the input and directly outputs the segmented sequence. It can also tackle other NLP tasks with CWS jointly in an end-to-end mode. Experiments on Weibo, PKU and MSRA benchmark datasets show that our approach has achieved competitive performances compared with state-of-the-art methods. Meanwhile, we successfully applied our proposed model to jointly learning CWS and Chinese spelling correction, which demonstrates its applicability of multi-task fusion.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Xuewen Shi (2 papers)
  2. Heyan Huang (107 papers)
  3. Ping Jian (9 papers)
  4. Yuhang Guo (54 papers)
  5. Xiaochi Wei (12 papers)
  6. Yi-Kun Tang (3 papers)
Citations (6)