Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

FakeClaim: A Multiple Platform-driven Dataset for Identification of Fake News on 2023 Israel-Hamas War (2401.16625v1)

Published 29 Jan 2024 in cs.IR and cs.SI

Abstract: We contribute the first publicly available dataset of factual claims from different platforms and fake YouTube videos on the 2023 Israel-Hamas war for automatic fake YouTube video classification. The FakeClaim data is collected from 60 fact-checking organizations in 30 languages and enriched with metadata from the fact-checking organizations curated by trained journalists specialized in fact-checking. Further, we classify fake videos within the subset of YouTube videos using textual information and user comments. We used a pre-trained model to classify each video with different feature combinations. Our best-performing fine-tuned LLM, Universal Sentence Encoder (USE), achieves a Macro F1 of 87\%, which shows that the trained model can be helpful for debunking fake videos using the comments from the user discussion. The dataset is available on Github\footnote{https://github.com/Gautamshahi/FakeClaim}

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)
  1. Alexander G Lovelace. Tomorrow’s wars and the media. The US Army War College Quarterly: Parameters, 52(2):117–134, 2022.
  2. FakeCovid–A multilingual cross-domain fact check news dataset for COVID-19. arXiv preprint arXiv:2006.11343, 2020.
  3. Amused: an annotation framework of multimodal social media data. In International Conference on Intelligent Technologies and Applications, pages 287–299. Springer, 2021.
  4. Universal Sentence Encoder for English. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP: System Demonstrations, Brussels, Belgium, October 31 - November 4,, pages 169–174. Association for Computational Linguistics, 2018.
  5. FA-KES: A fake news dataset around the Syrian war. In Proceedings of the Thirteenth International Conference on Web and Social Media, ICWSM 2019, Munich, Germany, June 11-14, 2019, pages 573–582. AAAI Press, 2019.
  6. Tweets in time of conflict: A public dataset tracking the Twitter Discourse on the War between Ukraine and Russia. In Proceedings of the Seventeenth International AAAI Conference on Web and Social Media, ICWSM 2023, June 5-8, 2023, Limassol, Cyprus, pages 1006–1013. AAAI Press, 2023.
  7. Propaganda and misinformation on Facebook and Twitter during the Russian Invasion of Ukraine. In Proceedings of the 15th ACM Web Science Conference, WebSci , Austin, TX, USA, 30 April 2023 - 1 May, pages 65–74. ACM, 2023.
  8. A Reddit dataset for the Russo-Ukrainian conflict in 2022. arXiv preprint arXiv:2206.05107, 2022.
  9. Content-Based Unsupervised Fake News Detection on Ukraine-Russia War. SMU Data Science Review, 7(1):3.
  10. Overview of the CLEF-2022 checkthat! lab: Task 3 on fake news detection. In Proceedings of the Working Notes of CLEF 2022 - Conference and Labs of the Evaluation Forum, Bologna, Italy, Sept. 5th-8th, pages 404–421. CEUR-WS.org, 2022.
  11. Overview of the CLEF-2021 checkthat! lab: Task 3 on fake news detection. In Proceedings of the Working Notes of CLEF - Conference and Labs of the Evaluation Forum, Bucharest, Romania, September 21st - to - 24th., volume 2936 of CEUR Workshop Proceedings, pages 406–423. CEUR-WS.org, 2021.
  12. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, 2019.
  13. The networked context of COVID-19 misinformation: Informational homogeneity on youtube at the beginning of the pandemic. Online Soc. Networks Media, 26:100164, 2021.
  14. Decor: Degree-corrected social graph refinement for fake news detection. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 2582–2593, 2023.
  15. An exploratory study of COVID-19 misinformation on Twitter. Online Social Networks and Media, 22:100104, 2021.
  16. Fake news identification on Twitter with hybrid CNN and RNN models. In Proceedings of the 9th International Conference on Social Media and Society, SMSociety 2018, Copenhagen, Denmark, July 18-20, 2018, pages 226–230. ACM, 2018.
  17. Swivel: Improving embeddings by noticing what’s missing. arXiv preprint arXiv:1602.02215, 2016.
  18. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
  19. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
  20. Lightweight adaptation of neural language models via subspace embedding. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, pages 3968–3972, 2023.
Citations (1)

Summary

We haven't generated a summary for this paper yet.