Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Sentence Segmentation for Classical Chinese Based on LSTM with Radical Embedding (1810.03479v1)

Published 5 Oct 2018 in cs.CL and cs.LG

Abstract: In this paper, we develop a low than character feature embedding called radical embedding, and apply it on LSTM model for sentence segmentation of pre modern Chinese texts. The datasets includes over 150 classical Chinese books from 3 different dynasties and contains different literary styles. LSTM CRF model is a state of art method for the sequence labeling problem. Our new model adds a component of radical embedding, which leads to improved performances. Experimental results based on the aforementioned Chinese books demonstrates a better accuracy than earlier methods on sentence segmentation, especial in Tang Epitaph texts.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Xu Han (270 papers)
  2. Hongsu Wang (7 papers)
  3. Sanqian Zhang (2 papers)
  4. Qunchao Fu (1 paper)
  5. Jun S. Liu (49 papers)
Citations (15)