Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 157 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 38 tok/s Pro
GPT-5 High 34 tok/s Pro
GPT-4o 91 tok/s Pro
Kimi K2 205 tok/s Pro
GPT OSS 120B 438 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Analysing State-Backed Propaganda Websites: a New Dataset and Linguistic Study (2310.14032v1)

Published 21 Oct 2023 in cs.CL

Abstract: This paper analyses two hitherto unstudied sites sharing state-backed disinformation, Reliable Recent News (rrn.world) and WarOnFakes (waronfakes.com), which publish content in Arabic, Chinese, English, French, German, and Spanish. We describe our content acquisition methodology and perform cross-site unsupervised topic clustering on the resulting multilingual dataset. We also perform linguistic and temporal analysis of the web page translations and topics over time, and investigate articles with false publication dates. We make publicly available this new dataset of 14,053 articles, annotated with each language version, and additional metadata such as links and images. The main contribution of this paper for the NLP community is in the novel dataset which enables studies of disinformation networks, and the training of NLP tools for disinformation detection.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (21)
  1. On the surprising behavior of distance metrics in high dimensional spaces. In International Conference on Database Theory.
  2. Doppelganger - media clones serving russian propaganda. Technical report, EU DisinfoLab.
  3. Nick Backovic and Kyle Walter. 2023. Logically investigations: Russian propaganda disguised as fact checking. Technical report, Logically.
  4. Proppy: Organizing the news based on their propagandistic content. Information Processing & Management, 56(5):1849–1864.
  5. Gillian Bolsover and Philip Howard. 2017. Computational propaganda and political big data: Moving toward a more critical research agenda. Big Data, 5(4):273–276.
  6. Density-based clustering based on hierarchical density estimates. In Advances in Knowledge Discovery and Data Mining, pages 160–172, Berlin, Heidelberg. Springer Berlin Heidelberg.
  7. Findings of the NLP4IF-2019 shared task on fine-grained propaganda detection. In Proceedings of the Second Workshop on Natural Language Processing for Internet Freedom: Censorship, Disinformation, and Propaganda, pages 162–170, Hong Kong, China. Association for Computational Linguistics.
  8. Fine-grained analysis of propaganda in news article. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5636–5646, Hong Kong, China. Association for Computational Linguistics.
  9. Maarten Grootendorst. 2022. BERTopic: Neural topic modeling with a class-based TF-IDF procedure. Computing Research Repository, arXiv:2203.05794. Version 1.
  10. Happenstance: Utilizing semantic search to track russian state media narratives about the russo-ukrainian war on reddit. Computing Research Repository, arXiv:2205.14484v2. Version 2.
  11. Pkuseg: A toolkit for multi-domain chinese word segmentation. Computing Research Repository, arXiv:1906.11455. Version 3.
  12. Eva Maitland. 2022. RRN.world nutrition label. Technical report, NewsGuard.
  13. A survey on computational propaganda detection. Computing Research Repository, arXiv:2007.08024.
  14. UMAP: Uniform manifold approximation and projection for dimension reduction. Computing Research Repository, arXiv:1802.03426.
  15. Mangirdas Morkūnas. 2022. Russian disinformation in the baltics: Does it really work? Public Integrity, pages 1–15.
  16. Ben Nimmo and Mike Torrey. 2022. Taking down coordinated inauthentic behavior from russia and china. Technical report, Meta.
  17. The development and psychometric properties of LIWC2015.
  18. Truth of varying shades: Analyzing language in fake news and political fact-checking. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2931–2937, Copenhagen, Denmark. Association for Computational Linguistics.
  19. Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992, Hong Kong, China. Association for Computational Linguistics.
  20. Madeline Roache. 2022. WarOnFakes.com nutrition label. Technical report, NewsGuard.
  21. MPNet: Masked and permuted pre-training for language understanding. Computing Research Repository, arXiv:2004.09297. Version 2.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.