FakeClaim: A Multiple Platform-driven Dataset for Identification of Fake News on 2023 Israel-Hamas War (2401.16625v1)
Abstract: We contribute the first publicly available dataset of factual claims from different platforms and fake YouTube videos on the 2023 Israel-Hamas war for automatic fake YouTube video classification. The FakeClaim data is collected from 60 fact-checking organizations in 30 languages and enriched with metadata from the fact-checking organizations curated by trained journalists specialized in fact-checking. Further, we classify fake videos within the subset of YouTube videos using textual information and user comments. We used a pre-trained model to classify each video with different feature combinations. Our best-performing fine-tuned LLM, Universal Sentence Encoder (USE), achieves a Macro F1 of 87\%, which shows that the trained model can be helpful for debunking fake videos using the comments from the user discussion. The dataset is available on Github\footnote{https://github.com/Gautamshahi/FakeClaim}
- Alexander G Lovelace. Tomorrow’s wars and the media. The US Army War College Quarterly: Parameters, 52(2):117–134, 2022.
- FakeCovid–A multilingual cross-domain fact check news dataset for COVID-19. arXiv preprint arXiv:2006.11343, 2020.
- Amused: an annotation framework of multimodal social media data. In International Conference on Intelligent Technologies and Applications, pages 287–299. Springer, 2021.
- Universal Sentence Encoder for English. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP: System Demonstrations, Brussels, Belgium, October 31 - November 4,, pages 169–174. Association for Computational Linguistics, 2018.
- FA-KES: A fake news dataset around the Syrian war. In Proceedings of the Thirteenth International Conference on Web and Social Media, ICWSM 2019, Munich, Germany, June 11-14, 2019, pages 573–582. AAAI Press, 2019.
- Tweets in time of conflict: A public dataset tracking the Twitter Discourse on the War between Ukraine and Russia. In Proceedings of the Seventeenth International AAAI Conference on Web and Social Media, ICWSM 2023, June 5-8, 2023, Limassol, Cyprus, pages 1006–1013. AAAI Press, 2023.
- Propaganda and misinformation on Facebook and Twitter during the Russian Invasion of Ukraine. In Proceedings of the 15th ACM Web Science Conference, WebSci , Austin, TX, USA, 30 April 2023 - 1 May, pages 65–74. ACM, 2023.
- A Reddit dataset for the Russo-Ukrainian conflict in 2022. arXiv preprint arXiv:2206.05107, 2022.
- Content-Based Unsupervised Fake News Detection on Ukraine-Russia War. SMU Data Science Review, 7(1):3.
- Overview of the CLEF-2022 checkthat! lab: Task 3 on fake news detection. In Proceedings of the Working Notes of CLEF 2022 - Conference and Labs of the Evaluation Forum, Bologna, Italy, Sept. 5th-8th, pages 404–421. CEUR-WS.org, 2022.
- Overview of the CLEF-2021 checkthat! lab: Task 3 on fake news detection. In Proceedings of the Working Notes of CLEF - Conference and Labs of the Evaluation Forum, Bucharest, Romania, September 21st - to - 24th., volume 2936 of CEUR Workshop Proceedings, pages 406–423. CEUR-WS.org, 2021.
- Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, 2019.
- The networked context of COVID-19 misinformation: Informational homogeneity on youtube at the beginning of the pandemic. Online Soc. Networks Media, 26:100164, 2021.
- Decor: Degree-corrected social graph refinement for fake news detection. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 2582–2593, 2023.
- An exploratory study of COVID-19 misinformation on Twitter. Online Social Networks and Media, 22:100104, 2021.
- Fake news identification on Twitter with hybrid CNN and RNN models. In Proceedings of the 9th International Conference on Social Media and Society, SMSociety 2018, Copenhagen, Denmark, July 18-20, 2018, pages 226–230. ACM, 2018.
- Swivel: Improving embeddings by noticing what’s missing. arXiv preprint arXiv:1602.02215, 2016.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
- On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
- Lightweight adaptation of neural language models via subspace embedding. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, pages 3968–3972, 2023.