Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Exploiting Language Model for Efficient Linguistic Steganalysis (2107.12168v3)

Published 26 Jul 2021 in cs.CL and cs.MM

Abstract: Recent advances in linguistic steganalysis have successively applied CNN, RNN, GNN and other efficient deep models for detecting secret information in generative texts. These methods tend to seek stronger feature extractors to achieve higher steganalysis effects. However, we have found through experiments that there actually exists significant difference between automatically generated stego texts and carrier texts in terms of the conditional probability distribution of individual words. Such kind of difference can be naturally captured by the LLM used for generating stego texts. Through further experiments, we conclude that this ability can be transplanted to a text classifier by pre-training and fine-tuning to improve the detection performance. Motivated by this insight, we propose two methods for efficient linguistic steganalysis. One is to pre-train a LLM based on RNN, and the other is to pre-train a sequence autoencoder. The results indicate that the two methods have different degrees of performance gain compared to the randomly initialized RNN, and the convergence speed is significantly accelerated. Moreover, our methods achieved the best performance compared to related works, while providing a solution for real-world scenario where there are more cover texts than stego texts.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Biao Yi (11 papers)
  2. Hanzhou Wu (36 papers)
  3. Guorui Feng (9 papers)
  4. Xinpeng Zhang (86 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.