Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Glyph-aware Embedding of Chinese Characters (1709.00028v1)

Published 31 Aug 2017 in cs.CL and cs.LG

Abstract: Given the advantage and recent success of English character-level and subword-unit models in several NLP tasks, we consider the equivalent modeling problem for Chinese. Chinese script is logographic and many Chinese logograms are composed of common substructures that provide semantic, phonetic and syntactic hints. In this work, we propose to explicitly incorporate the visual appearance of a character's glyph in its representation, resulting in a novel glyph-aware embedding of Chinese characters. Being inspired by the success of convolutional neural networks in computer vision, we use them to incorporate the spatio-structural patterns of Chinese glyphs as rendered in raw pixels. In the context of two basic Chinese NLP tasks of LLMing and word segmentation, the model learns to represent each character's task-relevant semantic and syntactic information in the character-level embedding.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Falcon Z. Dai (9 papers)
  2. Zheng Cai (157 papers)
Citations (39)

Summary

We haven't generated a summary for this paper yet.