Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Detect Camouflaged Spam Content via StoneSkipping: Graph and Text Joint Embedding for Chinese Character Variation Representation (1908.11561v1)

Published 30 Aug 2019 in cs.CL

Abstract: The task of Chinese text spam detection is very challenging due to both glyph and phonetic variations of Chinese characters. This paper proposes a novel framework to jointly model Chinese variational, semantic, and contextualized representations for Chinese text spam detection task. In particular, a Variation Family-enhanced Graph Embedding (VFGE) algorithm is designed based on a Chinese character variation graph. The VFGE can learn both the graph embeddings of the Chinese characters (local) and the latent variation families (global). Furthermore, an enhanced bidirectional LLM, with a combination gate function and an aggregation learning function, is proposed to integrate the graph and text information while capturing the sequential information. Extensive experiments have been conducted on both SMS and review datasets, to show the proposed method outperforms a series of state-of-the-art models for Chinese spam detection.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Zhuoren Jiang (24 papers)
  2. Zhe Gao (13 papers)
  3. Guoxiu He (15 papers)
  4. Yangyang Kang (32 papers)
  5. Changlong Sun (37 papers)
  6. Qiong Zhang (56 papers)
  7. Luo Si (73 papers)
  8. Xiaozhong Liu (71 papers)
Citations (17)

Summary

We haven't generated a summary for this paper yet.