Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

An Empirical Study of Discriminative Sequence Labeling Models for Vietnamese Text Processing (1708.09163v1)

Published 30 Aug 2017 in cs.CL

Abstract: This paper presents an empirical study of two widely-used sequence prediction models, Conditional Random Fields (CRFs) and Long Short-Term Memory Networks (LSTMs), on two fundamental tasks for Vietnamese text processing, including part-of-speech tagging and named entity recognition. We show that a strong lower bound for labeling accuracy can be obtained by relying only on simple word-based features with minimal hand-crafted feature engineering, of 90.65\% and 86.03\% performance scores on the standard test sets for the two tasks respectively. In particular, we demonstrate empirically the surprising efficiency of word embeddings in both of the two tasks, with both of the two models. We point out that the state-of-the-art LSTMs model does not always outperform significantly the traditional CRFs model, especially on moderate-sized data sets. Finally, we give some suggestions and discussions for efficient use of sequence labeling models in practical applications.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Phuong Le-Hong (15 papers)
  2. Minh Pham Quang Nhat (1 paper)
  3. Thai-Hoang Pham (15 papers)
  4. Tuan-Anh Tran (2 papers)
  5. Dang-Minh Nguyen (1 paper)
Citations (4)

Summary

We haven't generated a summary for this paper yet.