Exposing propaganda: an analysis of stylistic cues comparing human annotations and machine classification (2402.03780v3)
Abstract: This paper investigates the language of propaganda and its stylistic features. It presents the PPN dataset, standing for Propagandist Pseudo-News, a multisource, multilingual, multimodal dataset composed of news articles extracted from websites identified as propaganda sources by expert agencies. A limited sample from this set was randomly mixed with papers from the regular French press, and their URL masked, to conduct an annotation-experiment by humans, using 11 distinct labels. The results show that human annotators were able to reliably discriminate between the two types of press across each of the labels. We propose different NLP techniques to identify the cues used by the annotators, and to compare them with machine classification. They include the analyzer VAGO to measure discourse vagueness and subjectivity, a TF-IDF to serve as a baseline, and four different classifiers: two RoBERTa-based models, CATS using syntax, and one XGBoost combining syntactic and semantic features.
- Detecting opinion spams and fake news using text classification. SECURITY AND PRIVACY, 1(1):e9.
- Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, pages 785–794, New York, NY, USA. ACM.
- SemEval-2020 task 11: Detection of propaganda techniques in news articles. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pages 1377–1414, Barcelona (online). International Committee for Computational Linguistics.
- A survey on computational propaganda detection. CoRR, abs/2007.08024.
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 [cs].
- Paul Égré and Benjamin Icard. 2018. Lying and vagueness. In J. Meibauer, editor, Oxford Handbook of Lying. OUP.
- A novel hybrid approach for text encoding: Cognitive attention to syntax model to detect online misinformation. Data & Knowledge Engineering, 148:102230.
- Paul Grice. 1975. Logic and conversation. In Speech acts, pages 41–58. Brill.
- Combining vagueness detection with deep learning to identify fake news. In IEEE 24th International Conference on Information Fusion (FUSION), pages 1–8.
- news-please: A generic news crawler and extractor. In Proceedings of the 15th International Symposium of Information Science, pages 218–223.
- Analysing state-backed propaganda websites: a new dataset and linguistic study.
- Benjamin Horne and Sibel Adali. 2017. This just in: Fake news packs a lot in title, uses simpler, repetitive content in text body, more similar to satire than real news. In Proceedings of the international AAAI conference on web and social media, volume 11, pages 759–766.
- VAGO: un outil en ligne de mesure du vague et de la subjectivité. In Conférence Nationale sur les Applications Pratiques de l’Intelligence Artificielle (PFIA 2022), pages 68–71.
- Measuring vagueness and subjectivity in texts: from symbolic to neural VAGO. In IEEE International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT 2023).
- Garth S Jowett and Victoria O’Donnell. 2019. Propaganda & persuasion. Sage publications. 7th edition.
- RoBERTa: A robustly optimized BERT pretraining approach.
- Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 4765–4774. Curran Associates, Inc.
- Divisive language and propaganda detection using multi-head attention transformers with deep learning BERT-based language models for binary classification. In Proceedings of the Second Workshop on Natural Language Processing for Internet Freedom: Censorship, Disinformation, and Propaganda, pages 103–106, Hong Kong, China. Association for Computational Linguistics.
- CamemBERT: a tasty French language model. arXiv preprint.
- Anne Quaranto and Jason Stanley. 2021. Propaganda. In Justin Khoo and Rachel Katharine Sterken, editors, The Routledge Handbook of Social and Political Philosophy of Language, pages 125–146.
- Fakenewsnet: A data repository with news content, social context and dynamic information for studying fake news on social media. arXiv preprint.