FakeNewsNet: A Data Repository with News Content, Social Context and Spatialtemporal Information for Studying Fake News on Social Media (1809.01286v3)

Published 5 Sep 2018 in cs.SI

Abstract: Social media has become a popular means for people to consume news. Meanwhile, it also enables the wide dissemination of fake news, i.e., news with intentionally false information, which brings significant negative effects to the society. Thus, fake news detection is attracting increasing attention. However, fake news detection is a non-trivial task, which requires multi-source information such as news content, social context, and dynamic information. First, fake news is written to fool people, which makes it difficult to detect fake news simply based on news contents. In addition to news contents, we need to explore social contexts such as user engagements and social behaviors. For example, a credible user's comment that "this is a fake news" is a strong signal for detecting fake news. Second, dynamic information such as how fake news and true news propagate and how users' opinions toward news pieces are very important for extracting useful patterns for (early) fake news detection and intervention. Thus, comprehensive datasets which contain news content, social context, and dynamic information could facilitate fake news propagation, detection, and mitigation; while to the best of our knowledge, existing datasets only contains one or two aspects. Therefore, in this paper, to facilitate fake news related researches, we provide a fake news data repository FakeNewsNet, which contains two comprehensive datasets that includes news content, social context, and dynamic information. We present a comprehensive description of datasets collection, demonstrate an exploratory analysis of this data repository from different perspectives, and discuss the benefits of FakeNewsNet for potential applications on fake news study on social media.

PDF Abstract

Exploring FakeNewsNet: A Comprehensive Repository for Fake News Detection on Social Media

The proliferation of fake news on social media platforms is a significant concern, influencing public opinion and even real-world events. The paper "FakeNewsNet: A Data Repository with News Content, Social Context, and Spatiotemporal Information for Studying Fake News on Social Media" by Shu et al. aims to tackle this growing problem by presenting a detailed and multifaceted data repository called FakeNewsNet. This repository includes two comprehensive datasets rich in news content, social context, and spatiotemporal features, providing a robust foundation for various fake news research applications.

Overview and Methodology

FakeNewsNet addresses several limitations inherent in existing datasets by integrating a broader spectrum of features. The data is sourced and processed through a systematic methodology involving reputable fact-checking websites like PolitiFact and GossipCop, coupled with a custom-designed data crawling framework. The repository predominantly includes datasets from political and entertainment news domains, ultimately facilitating robust and detailed fake news research.

Key Features:

News Content: The repository contains news articles with associated metadata such as text, images, and publish dates, collected directly from fact-checking sources.
Social Context: This includes comprehensive details of user engagements, such as tweets, retweets, likes, replies, and associated user profiles. The repository's depth extends to second-order engagements and social network structures.
Spatiotemporal Information: These attributes capture the dynamic propagation patterns of news articles, including temporal engagement data and spatial distribution of user interactions.

Analytical Insights

Preliminary analysis of the datasets demonstrates significant differences between fake and real news in terms of content, user engagement patterns, and temporal behaviors. For example, the repository shows that bots are disproportionately involved in propagating fake news—a finding consistent with other studies in computational propaganda.

Moreover, sentiment analysis of user replies indicates that fake news tends to generate more negative sentiments compared to real news. These insights, while preliminary, underscore the repository's potential in facilitating fine-grained research.

Evaluation and Performance

The authors conducted an empirical evaluation of fake news detection using state-of-the-art models, including Support Vector Machines (SVM), Logistic Regression (LR), Naive Bayes (NB), Convolutional Neural Networks (CNN), and Social Article Fusion (SAF). The findings reveal that SAF, which leverages both news content and social context, outperforms models relying solely on news content or social context.

Performance Highlights:

PolitiFact Dataset: SAF achieves an accuracy of 0.691, significantly higher than content-based models like SVM (0.580) and CNN (0.629).
GossipCop Dataset: Here too, SAF displays superior performance with an accuracy of 0.689.

These quantitative results affirm the repository's utility in not only improving detection algorithms but also exploring user engagement dynamics in greater detail.

Implications and Future Directions

FakeNewsNet offers remarkable contributions to fake news research with practical and theoretical implications. Its comprehensive nature bridges significant gaps, facilitating advanced detection models that consider multifaceted data aspects. Furthermore, the repository's rich feature set opens avenues for investigating fake news evolution, mitigation strategies, and the roles of malicious accounts like bots.

Potential Applications:

Fake News Detection: Advanced models can be developed using the content, social context, and spatiotemporal data.
Fake News Evolution: Understanding how fake news propagates and transforms over time.
Mitigation Strategies: Developing interventions to minimize the spread or influence of fake news.
Malicious Account Detection: Identifying social bots and other automated agents that promote disinformation.

Future work can expand the repository to include additional datasets, refine the data collection strategies to reduce noise, and integrate with front-end tools for real-time fake news monitoring and analysis.

Conclusion

The paper by Shu et al. presents FakeNewsNet as a structured, multi-dimensional effort to paper and counteract fake news on social media. The repository's extensive and varied datasets enhance the potential for new, effective approaches in fake news detection and analysis, offering an invaluable resource for researchers dedicated to mitigating the impact of disinformation.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Kai Shu (88 papers)
Deepak Mahudeswaran (2 papers)
Suhang Wang (118 papers)
Dongwon Lee (65 papers)
Huan Liu (283 papers)

Citations (789)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos