Fake News Detection on Social Media: A Data Mining Perspective (1708.01967v3)

Published 7 Aug 2017 in cs.SI and cs.AI

Abstract: Social media for news consumption is a double-edged sword. On the one hand, its low cost, easy access, and rapid dissemination of information lead people to seek out and consume news from social media. On the other hand, it enables the wide spread of "fake news", i.e., low quality news with intentionally false information. The extensive spread of fake news has the potential for extremely negative impacts on individuals and society. Therefore, fake news detection on social media has recently become an emerging research that is attracting tremendous attention. Fake news detection on social media presents unique characteristics and challenges that make existing detection algorithms from traditional news media ineffective or not applicable. First, fake news is intentionally written to mislead readers to believe false information, which makes it difficult and nontrivial to detect based on news content; therefore, we need to include auxiliary information, such as user social engagements on social media, to help make a determination. Second, exploiting this auxiliary information is challenging in and of itself as users' social engagements with fake news produce data that is big, incomplete, unstructured, and noisy. Because the issue of fake news detection on social media is both challenging and relevant, we conducted this survey to further facilitate research on the problem. In this survey, we present a comprehensive review of detecting fake news on social media, including fake news characterizations on psychology and social theories, existing algorithms from a data mining perspective, evaluation metrics and representative datasets. We also discuss related research areas, open problems, and future research directions for fake news detection on social media.

PDF Abstract

Fake News Detection on Social Media

The paper "Fake News Detection on Social Media: A Data Mining Perspective" by Kai Shu, Amy Sliva, Suhang Wang, Jiliang Tang, and Huan Liu provides an in-depth exploration of the challenge of detecting fake news on social media platforms. The goal of this research is to present a structured review of the methods and techniques relevant to the detection of fake news, emphasizing the unique opportunities and challenges posed by social media environments.

Key Characteristics and Challenges

The authors outline the dual-nature impact of social media on news dissemination: while it offers cost-effective and rapid sharing of information, it also facilitates the widespread dissemination of fake news - defined as news that is intentionally and verifiably false. The paper highlights specific challenges in this domain:

Content Manipulation: Fake news is deliberately crafted to mislead readers, rendering traditional content-based detection methods insufficient.
Auxiliary Information Exploitation: The vast, unstructured, and noisy data from user interactions on social media complicates the detection process.

Psychological and Social Foundations

The paper discusses the psychological underpinnings that make consumers vulnerable to fake news, including na\"ive realism and confirmation bias, as well as the social dynamics, such as the echo chamber effect, driven by social media networks. This context is crucial for understanding why fake news is so pervasive and difficult to counteract in the digital age.

Feature Extraction for Fake News Detection

The detection framework proposed is bifurcated into feature extraction and model construction phases:

News Content Features:
- Linguistic Features: Common linguistic markers (syntax, lexical features) and domain-specific markers (quotes, links) that hint at deception.
- Visual Features: Image analysis to detect sensationalism or manipulation.
Social Context Features:
- User-based Features: Profiling users to identify malicious entities such as bots or trolls.
- Post-based Features: Analyzing the emotional and opinionated responses from users.
- Network-based Features: Examining the structure of the information propagation network to identify echo chambers and influential nodes.

Detection Models

Detection methodologies are categorized into two primary model types:

News Content Models:
- Knowledge-based: Validates news content against external fact-checking databases.
- Style-based: Identifies deceptive writing styles and lack of objectivity indicative of fake news.
Social Context Models:
- Stance-based: Derives the stance of posts relative to the news, leveraging crowd wisdom.
- Propagation-based: Uses diffusion patterns within social networks to infer credibility.

Evaluation Metrics and Datasets

Evaluation of detection methods typically uses classification metrics such as precision, recall, F1-score, and accuracy, along with the ROC-AUC for imbalanced datasets. Available datasets vary in completeness and relevance, and include the BuzzFeedNews, LIAR, BS Detector, and CREDBANK datasets.

Implications and Future Directions

The paper concludes by outlining several future research directions:

Data-oriented Research: Enhanced dataset creation, exploiting temporal and psychological data patterns.
Feature-oriented Research: Developing advanced feature extraction techniques, particularly for visual content.
Model-oriented Research: Building more complex and effective models, including semi-supervised and unsupervised approaches to overcome the challenge of limited annotated data.
Application-oriented Research: Broader applications include understanding fake news diffusion patterns and developing intervention methods to mitigate the spread of fake news.

Conclusion

This comprehensive survey underscores the need for multi-faceted approaches to effectively detect fake news in social media settings. The integration of content analysis and social context, supported by sophisticated machine learning models, is crucial to combat this pervasive issue. Future research, particularly in data collection and model enhancement, promises significant advancements in fake news detection capabilities. The ongoing development of benchmark datasets and novel intervention techniques will further support researchers and practitioners in addressing this problem.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Kai Shu (88 papers)
Amy Sliva (2 papers)
Suhang Wang (118 papers)
Jiliang Tang (204 papers)
Huan Liu (283 papers)

Citations (2,614)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos