2000 character limit reached
Linguistic Characteristics of Censorable Language on SinaWeibo (1807.03654v1)
Published 10 Jul 2018 in cs.CL
Abstract: This paper investigates censorship from a linguistic perspective. We collect a corpus of censored and uncensored posts on a number of topics, build a classifier that predicts censorship decisions independent of discussion topics. Our investigation reveals that the strongest linguistic indicator of censored content of our corpus is its readability.