Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Fast and Accurate Vietnamese Word Segmenter (1709.06307v2)

Published 19 Sep 2017 in cs.CL

Abstract: We propose a novel approach to Vietnamese word segmentation. Our approach is based on the Single Classification Ripple Down Rules methodology (Compton and Jansen, 1990), where rules are stored in an exception structure and new rules are only added to correct segmentation errors given by existing rules. Experimental results on the benchmark Vietnamese treebank show that our approach outperforms previous state-of-the-art approaches JVnSegmenter, vnTokenizer, DongDu and UETsegmenter in terms of both accuracy and performance speed. Our code is open-source and available at: https://github.com/datquocnguyen/RDRsegmenter.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Dat Quoc Nguyen (55 papers)
  2. Dai Quoc Nguyen (26 papers)
  3. Thanh Vu (59 papers)
  4. Mark Dras (38 papers)
  5. Mark Johnson (46 papers)
Citations (61)

Summary

We haven't generated a summary for this paper yet.