Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MCFEND: A Multi-source Benchmark Dataset for Chinese Fake News Detection (2403.09092v2)

Published 14 Mar 2024 in cs.CL and cs.AI

Abstract: The prevalence of fake news across various online sources has had a significant influence on the public. Existing Chinese fake news detection datasets are limited to news sourced solely from Weibo. However, fake news originating from multiple sources exhibits diversity in various aspects, including its content and social context. Methods trained on purely one single news source can hardly be applicable to real-world scenarios. Our pilot experiment demonstrates that the F1 score of the state-of-the-art method that learns from a large Chinese fake news detection dataset, Weibo-21, drops significantly from 0.943 to 0.470 when the test data is changed to multi-source news data, failing to identify more than one-third of the multi-source fake news. To address this limitation, we constructed the first multi-source benchmark dataset for Chinese fake news detection, termed MCFEND, which is composed of news we collected from diverse sources such as social platforms, messaging apps, and traditional online news outlets. Notably, such news has been fact-checked by 14 authoritative fact-checking agencies worldwide. In addition, various existing Chinese fake news detection methods are thoroughly evaluated on our proposed dataset in cross-source, multi-source, and unseen source ways. MCFEND, as a benchmark dataset, aims to advance Chinese fake news detection approaches in real-world scenarios.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Fake News, Disinformation and Misinformation in Social Media: A Review. Social Network Analysis and Mining 13, 1 (2023), 30.
  2. Detection and Visualization of Misleading Content on Twitter. International Journal of Multimedia Information Retrieval 7, 1 (2018), 71–86.
  3. Alessandro Bondielli and Francesco Marcelloni. 2019. A Survey on Fake News and Rumour Detection Techniques. Information Sciences 497 (2019), 38–55.
  4. SemEval-2022 Task 8: Multilingual News Article Similarity. In Proc. of SemEval. 1094–1106.
  5. Cross-Modal Ambiguity Learning for Multimodal Fake News Detection. In Proc. of WWW. 2897–2905.
  6. Ginger Cannot Cure Cancer: Battling FakeHealth News with a Comprehensive Data Repository. In Proc. of ICWSM. 853–862.
  7. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proc. of NAACL. 4171–4186.
  8. Deep residual learning for image recognition. In Proc. of CVPR. 770–778.
  9. Deep Learning for Fake News Detection: A Comprehensive Survey. AI Open 3 (2022), 133–155.
  10. MR2: A Benchmark for Multimodal Retrieval-Augmented Rumor Detection in Social Media. In Proc. of SIGIR. 2901–2912.
  11. Multimodal Fusion with Recurrent Neural Networks for Rumor Detection on Microblogs. In Proc. of MM. 795–816.
  12. Improved target-specific stance detection on social media platforms by delving into conversation threads. IEEE Transactions on Computational Social Systems (2023).
  13. Contextual Target-Specific Stance Detection on Twitter: Dataset and Method. In Proc. of IEEE ICDM. 359–367.
  14. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv preprint arXiv:1907.11692 (2019).
  15. Jing Ma and Wei Gao. 2020. Debunking Rumors on Twitter with Tree Transformer. In Proc. of COLING. 5455–5466.
  16. Rumor Detection on Twitter with Tree-structured Recursive Neural Networks. In Proc. of ACL. 1980–1989.
  17. MDFEND: Multi-Domain Fake News Detection. In Proc. of CIKM. 3343–3347.
  18. Archita Pathak and Rohini Srihari. 2019. BREAKING! Presenting Fake News Corpus for Automated Fact Checking. In Proc. of ACL-SRW Workshop. 357–362.
  19. Francesco Pierri and Stefano Ceri. 2019. False News on Social Media: A Data-driven Survey. ACM Sigmod Record 48, 2 (2019), 18–27.
  20. Piotr Przybyla. 2020. Capturing the Style of Fake News. In Proc. of AAAI. 490–497.
  21. Learning Transferable Visual Models From Natural Language Supervision. In Proc. of ICML. 8748–8763.
  22. Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proc. of EMNLP.
  23. Megan Risdal. 2016. Getting Real about Fake News. https://doi.org/10.34740/KAGGLE/DSV/911
  24. Giovanni Santia and Jake Williams. 2018. BuzzFace: A News Veracity Dataset with Facebook User Commentary and Egos. In Proc. of ICWSM. 531–540.
  25. Zoom Out and Observe: News Environment Perception for Fake News Detection. In Proc. of ACL. 4543–4556.
  26. Article Reranking by Memory-Enhanced Key Sentence Matching for Detecting Previously Fact-Checked Claims. In Proc. of ACL. 5468–5481.
  27. Integrating Pattern- and Fact-based Fake News Detection via Model Preference Learning. In Proc. of CIKM. 1640–1650.
  28. DEFEND: Explainable Fake News Detection. In Proc. of KDD. 395–405.
  29. FakeNewsNet: A Data Repository with News Content, Social Context, and Spatiotemporal Information for Studying Fake News on Social Media. Big Data 8, 3 (2020), 171–188.
  30. Early Detection of Fake News with Multi-source Weak Social Supervision. In Proc. of ECML PKDD. 650–666.
  31. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks. In Proc. of ACL-IJCNLP. 1556–1566.
  32. Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing Data Using t-SNE. Journal of Machine Learning Research 9, 11 (2008).
  33. Trustworthy Machine Learning: Robustness, Generalization, and Interpretability. In Proc. of KDD. 5827–5828.
  34. William Yang Wang. 2017. “Liar, Liar Pants on Fire”: A New Benchmark Dataset for Fake News Detection. In Proc. of ACL. 422–426.
  35. EANN: Event Adversarial Neural Networks for Multi-Modal Fake News Detection. In Proc. of KDD. 849–857.
  36. Weak Supervision for Fake News Detection via Reinforcement Learning. In Proc. of AAAI. 516–523.
  37. Weibo. 2023. Weibo’s Annual Repot on Fake News. https://weibo.com/1866405545/MoICtozcu?type=repost.
  38. HFL at SemEval-2022 Task 8: A Linguistics-inspired Regression Model with Data Augmentation for Multilingual News Similarity. In Proc. of SemEval. 1114–1120.
  39. CHECKED: Chinese COVID-19 Fake News Dataset. Social Network Analysis and Mining 11, 1 (2021), 58.
  40. Mining Dual Emotion for Fake News Detection. In Proc. of WWW. 3465–3476.
  41. FaKnow: A Unified Library for Fake News Detection. arXiv preprint arXiv:2401.16441 (2024).
  42. Exploiting Context for Rumour Detection in Social Media. In Social Informatics. 109–123.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Yupeng Li (45 papers)
  2. Haorui He (11 papers)
  3. Jin Bai (5 papers)
  4. Dacheng Wen (2 papers)
Citations (4)