Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Unveiling Global Narratives: A Multilingual Twitter Dataset of News Media on the Russo-Ukrainian Conflict (2306.12886v2)

Published 22 Jun 2023 in cs.CL and cs.DL

Abstract: The ongoing Russo-Ukrainian conflict has been a subject of intense media coverage worldwide. Understanding the global narrative surrounding this topic is crucial for researchers that aim to gain insights into its multifaceted dimensions. In this paper, we present a novel multimedia dataset that focuses on this topic by collecting and processing tweets posted by news or media companies on social media across the globe. We collected tweets from February 2022 to May 2023 to acquire approximately 1.5 million tweets in 60 different languages along with their images. Each entry in the dataset is accompanied by processed tags, allowing for the identification of entities, stances, textual or visual concepts, and sentiment. The availability of this multimedia dataset serves as a valuable resource for researchers aiming to investigate the global narrative surrounding the ongoing conflict from various aspects such as who are the prominent entities involved, what stances are taken, where do these stances originate from, how are the different textual and visual concepts related to the event portrayed.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. Wartime Media Monitor (WarMM-2022): A Study of Information Manipulation on Russian Social Media during the Russia-Ukraine War. In Proceedings of the 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (Dubrovnik, Croatia). Association for Computational Linguistics, 152–161. https://aclanthology.org/2023.latechclfl-1.17
  2. CrisisHateMM: Multimodal Analysis of Directed and Undirected Hate Speech in Text-Embedded Images from Russia-Ukraine Conflict. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023 - Workshops, Vancouver, BC, Canada, June 17-24, 2023. IEEE, 1994–2003. https://doi.org/10.1109/CVPRW59228.2023.00193
  3. Characterizing the 2022- Russo-Ukrainian Conflict Through the Lenses of Aspect-Based Sentiment Analysis: Dataset, Methodology, and Key Findings. In 32nd International Conference on Computer Communications and Networks, ICCCN 2023, Honolulu, HI, USA, July 24-27, 2023. IEEE, 1–10. https://doi.org/10.1109/ICCCN58024.2023.10230192
  4. Emily Chen and Emilio Ferrara. 2023. Tweets in time of conflict: A public dataset tracking the twitter discourse on the war between Ukraine and Russia. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 17. 1006–1013.
  5. Scaling Instruction-Finetuned Language Models. https://doi.org/10.48550/ARXIV.2210.11416
  6. Semantic Analysis of Russo-Ukrainian War Tweet Networks. SCORES: Ljubljana, Slovenia (2022).
  7. Russian propaganda on social media during the 2022 invasion of Ukraine. EPJ Data Science 12, 1 (2023), 35. https://doi.org/10.1140/epjds/s13688-023-00414-5
  8. Open-Set Image Tagging with Multi-Grained Text Supervision. arXiv:2310.15200 [cs.CV]
  9. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. CoRR abs/1910.13461 (2019). arXiv:1910.13461 http://arxiv.org/abs/1910.13461
  10. TimeLMs: Diachronic Language Models from Twitter. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Association for Computational Linguistics, Dublin, Ireland, 251–260. https://doi.org/10.18653/v1/2022.acl-demo.25
  11. No Language Left Behind: Scaling Human-Centered Machine Translation. (2022).
  12. Challenges and Opportunities in Information Manipulation Detection: An Examination of Wartime Russian Media. In Findings of the Association for Computational Linguistics: EMNLP 2022. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 5209–5235. https://doi.org/10.18653/v1/2022.findings-emnlp.382
  13. VoynaSlov: a data set of Russian social media activity during the 2022 Ukraine-Russia War. arXiv preprint arXiv:2205.12382 (2022).
  14. Propaganda and Misinformation on Facebook and Twitter during the Russian Invasion of Ukraine. In Proceedings of the 15th ACM Web Science Conference 2023 (WebSci ’23). ACM. https://doi.org/10.1145/3578503.3583597
  15. Stanza: A Python Natural Language Processing Toolkit for Many Human Languages. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. https://nlp.stanford.edu/pubs/qi2020stanza.pdf
  16. Twitter Dataset on the Russo-Ukrainian War. arXiv preprint arXiv:2204.08530 (2022).
  17. A Multi-Modal Dataset for Hate Speech Detection on Social Media: Case-study of Russia-Ukraine Conflict. In Proceedings of the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE). Association for Computational Linguistics, Abu Dhabi, United Arab Emirates (Hybrid), 1–6. https://doi.org/10.18653/v1/2022.case-1.1
  18. Not Good Times for Lies: Misinformation Detection on the Russia-Ukraine War, COVID-19, and Refugees. arXiv:2210.05401 [cs.SI]
  19. Russia-Ukraine war: Modeling and Clustering the Sentiments Trends of Various Countries. arXiv preprint arXiv:2301.00604 (2023).
  20. Denny Vrandecic and Markus Krötzsch. 2014. Wikidata: a free collaborative knowledgebase. Commun. ACM (2014), 78–85. https://doi.org/10.1145/2629489
  21. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) (New Orleans, Louisiana). Association for Computational Linguistics, 1112–1122. http://aclweb.org/anthology/N18-1101
  22. Sentiment Analysis On Twitter Posts About The Russia and Ukraine War With Long Short-Term Memory. Sinkron: jurnal dan penelitian teknik informatika 8, 2 (2023), 789–797.
  23. A Reddit Dataset for the Russo-Ukrainian Conflict in 2022. arXiv:2206.05107 [cs.SI]
Citations (2)

Summary

We haven't generated a summary for this paper yet.