Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Joint Chinese Word Segmentation and Part-of-speech Tagging via Two-stage Span Labeling (2112.09488v1)

Published 17 Dec 2021 in cs.CL

Abstract: Chinese word segmentation and part-of-speech tagging are necessary tasks in terms of computational linguistics and application of natural language processing. Many re-searchers still debate the demand for Chinese word segmentation and part-of-speech tagging in the deep learning era. Nevertheless, resolving ambiguities and detecting unknown words are challenging problems in this field. Previous studies on joint Chinese word segmentation and part-of-speech tagging mainly follow the character-based tagging model focusing on modeling n-gram features. Unlike previous works, we propose a neural model named SpanSegTag for joint Chinese word segmentation and part-of-speech tagging following the span labeling in which the probability of each n-gram being the word and the part-of-speech tag is the main problem. We use the biaffine operation over the left and right boundary representations of consecutive characters to model the n-grams. Our experiments show that our BERT-based model SpanSegTag achieved competitive performances on the CTB5, CTB6, and UD, or significant improvements on CTB7 and CTB9 benchmark datasets compared with the current state-of-the-art method using BERT or ZEN encoders.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Duc-Vu Nguyen (18 papers)
  2. Linh-Bao Vo (2 papers)
  3. Ngoc-Linh Tran (3 papers)
  4. Kiet Van Nguyen (74 papers)
  5. Ngan Luu-Thuy Nguyen (56 papers)
Citations (6)

Summary

We haven't generated a summary for this paper yet.