Stanceosaurus 2.0: Classifying Stance Towards Russian and Spanish Misinformation (2402.03642v1)
Abstract: The Stanceosaurus corpus (Zheng et al., 2022) was designed to provide high-quality, annotated, 5-way stance data extracted from Twitter, suitable for analyzing cross-cultural and cross-lingual misinformation. In the Stanceosaurus 2.0 iteration, we extend this framework to encompass Russian and Spanish. The former is of current significance due to prevalent misinformation amid escalating tensions with the West and the violent incursion into Ukraine. The latter, meanwhile, represents an enormous community that has been largely overlooked on major social media platforms. By incorporating an additional 3,874 Spanish and Russian tweets over 41 misinformation claims, our objective is to support research focused on these issues. To demonstrate the value of this data, we employed zero-shot cross-lingual transfer on multilingual BERT, yielding results on par with the initial Stanceosaurus study with a macro F1 score of 43 for both languages. This underlines the viability of stance classification as an effective tool for identifying multicultural misinformation.
- Language-independent fake news detection: English, portuguese, and spanish mutual features. Future Internet, 12(5).
- Keith B. Alexander. 2017. Disinformation: A primer in russian active measures and influence campaigns. Prepared statement, United States Senate Select Committee on Intelligence.
- On the cross-lingual transferability of monolingual representations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics.
- Just say no: Analyzing the stance of neural dialogue generation in offensive contexts. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 4846–4862, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Lessons learned from monitoring spanish-language vaccine misinformation during the covid-19 pandemic. Public Health Rep, 138(4):586–592. PMID: 37102367; PMCID: PMC10140774.
- Class-balanced loss based on effective number of samples. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
- Bert: Pre-training of deep bidirectional transformers for language understanding.
- F. Dreizin and T. Priestly. 1982. A systematic approach to russian obscene language. Russ Linguist, 6:233–249.
- Bharath Ganesh and Jonathan Bright. 2020. Countering extremists on social media: Challenges for strategic communication and content moderation.
- L. Giorio. 2018. War on Propaganda or PRopaganda War?: A case study of fact-checking and (counter)propaganda in the EEAS project EUvsDisinfo. Dissertation, Uppsala University, Jagiellonian University.
- Rumoureval 2019: Determining rumour veracity and support for rumours.
- A survey of current datasets for code-switching research. In 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), pages 136–141.
- How language-neutral is multilingual bert?
- Stance prediction for russian: Data and analysis.
- Lauren A. McCarthy et al. 2023. Four months of “discrediting the military”: Repressive law in wartime russia. Demokratizatsiya: The Journal of Post-Soviet Democratization.
- A hierarchical network-oriented analysis of user participation in misinformation spread on whatsapp. Information Processing & Management, 59(1):102757.
- Challenges and opportunities in information manipulation detection: An examination of wartime russian media.
- Pew. 2015. A majority of english-speaking hispanics in the u.s. are bilingual. Accessed: 2023-06-11.
- Carol W. Pfaff. 1979. Constraints on language mixing: Intrasentential code-switching and borrowing in spanish/english. Language, 55(2):291–318.
- How multilingual is multilingual bert?
- Alina Polyakova and Chris Meserole. 2019. Exporting digital authoritarianism: The russian and chinese models. Policy Brief, Democracy and Disorder Series, pages 1–22.
- Juan-Pablo Posadas-Durán et al. 2019. Detection of fake news in a new corpus for the spanish language. Journal of Intelligent & Fuzzy Systems, 36(5):4869–4876.
- Susceptibility to misinformation about covid-19 around the world. R. Soc. open sci., 7:201199.
- Stance detection benchmark: How robust is your stance detection? Künstl Intell, 35:329–341.
- Automated multilingual detection of pro-kremlin propaganda in newspapers and telegram posts. Datenbank Spektrum, 23:5–14.
- The evolution of pro-kremlin propaganda from a machine learning and linguistics perspective. In Proceedings of the Second Ukrainian Natural Language Processing Workshop (UNLP), pages 40–48, Dubrovnik, Croatia. Association for Computational Linguistics.
- StatCounter. 2023. Social media stats spain. Accessed: June 11, 2023.
- Statista. 2023a. Ranking of social media platforms in russia q3 2022, by user share. Accessed: 2023-05-26.
- Statista. 2023b. Social media usage in latin america - statistics & facts. Accessed: June 11, 2023.
- Joanna Taylor and Claudia Pagliari. 2018. Mining social media data: How are research sponsors and researchers addressing the ethical challenges? Research Ethics, 14(2):1–39.
- Multilingual argument mining: Datasets and analysis. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 303–317, Online. Association for Computational Linguistics.
- Joseph E. Uscinski and Ryden W. Butler. 2013. The epistemology of fact checking. Critical Review, 25(2):162–180.
- Are multilingual models effective in code-switching?
- Shijie Wu and Mark Dredze. 2019. Beto, bentz, becas: The surprising cross-lingual effectiveness of bert.
- Stanceosaurus: Classifying stance towards multilingual misinformation.
- Multilingual stance detection in tweets: The Catalonia independence corpus. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 1368–1375, Marseille, France. European Language Resources Association.