Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Exposing propaganda: an analysis of stylistic cues comparing human annotations and machine classification (2402.03780v3)

Published 6 Feb 2024 in cs.CL, cs.AI, and cs.LG

Abstract: This paper investigates the language of propaganda and its stylistic features. It presents the PPN dataset, standing for Propagandist Pseudo-News, a multisource, multilingual, multimodal dataset composed of news articles extracted from websites identified as propaganda sources by expert agencies. A limited sample from this set was randomly mixed with papers from the regular French press, and their URL masked, to conduct an annotation-experiment by humans, using 11 distinct labels. The results show that human annotators were able to reliably discriminate between the two types of press across each of the labels. We propose different NLP techniques to identify the cues used by the annotators, and to compare them with machine classification. They include the analyzer VAGO to measure discourse vagueness and subjectivity, a TF-IDF to serve as a baseline, and four different classifiers: two RoBERTa-based models, CATS using syntax, and one XGBoost combining syntactic and semantic features.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (21)
  1. Detecting opinion spams and fake news using text classification. SECURITY AND PRIVACY, 1(1):e9.
  2. Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, pages 785–794, New York, NY, USA. ACM.
  3. SemEval-2020 task 11: Detection of propaganda techniques in news articles. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pages 1377–1414, Barcelona (online). International Committee for Computational Linguistics.
  4. A survey on computational propaganda detection. CoRR, abs/2007.08024.
  5. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 [cs].
  6. Paul Égré and Benjamin Icard. 2018. Lying and vagueness. In J. Meibauer, editor, Oxford Handbook of Lying. OUP.
  7. A novel hybrid approach for text encoding: Cognitive attention to syntax model to detect online misinformation. Data & Knowledge Engineering, 148:102230.
  8. Paul Grice. 1975. Logic and conversation. In Speech acts, pages 41–58. Brill.
  9. Combining vagueness detection with deep learning to identify fake news. In IEEE 24th International Conference on Information Fusion (FUSION), pages 1–8.
  10. news-please: A generic news crawler and extractor. In Proceedings of the 15th International Symposium of Information Science, pages 218–223.
  11. Analysing state-backed propaganda websites: a new dataset and linguistic study.
  12. Benjamin Horne and Sibel Adali. 2017. This just in: Fake news packs a lot in title, uses simpler, repetitive content in text body, more similar to satire than real news. In Proceedings of the international AAAI conference on web and social media, volume 11, pages 759–766.
  13. VAGO: un outil en ligne de mesure du vague et de la subjectivité. In Conférence Nationale sur les Applications Pratiques de l’Intelligence Artificielle (PFIA 2022), pages 68–71.
  14. Measuring vagueness and subjectivity in texts: from symbolic to neural VAGO. In IEEE International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT 2023).
  15. Garth S Jowett and Victoria O’Donnell. 2019. Propaganda & persuasion. Sage publications. 7th edition.
  16. RoBERTa: A robustly optimized BERT pretraining approach.
  17. Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 4765–4774. Curran Associates, Inc.
  18. Divisive language and propaganda detection using multi-head attention transformers with deep learning BERT-based language models for binary classification. In Proceedings of the Second Workshop on Natural Language Processing for Internet Freedom: Censorship, Disinformation, and Propaganda, pages 103–106, Hong Kong, China. Association for Computational Linguistics.
  19. CamemBERT: a tasty French language model. arXiv preprint.
  20. Anne Quaranto and Jason Stanley. 2021. Propaganda. In Justin Khoo and Rachel Katharine Sterken, editors, The Routledge Handbook of Social and Political Philosophy of Language, pages 125–146.
  21. Fakenewsnet: A data repository with news content, social context and dynamic information for studying fake news on social media. arXiv preprint.
Citations (4)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets