Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Fake News Early Detection: An Interdisciplinary Study (1904.11679v2)

Published 26 Apr 2019 in cs.CL and cs.SI

Abstract: Massive dissemination of fake news and its potential to erode democracy has increased the demand for accurate fake news detection. Recent advancements in this area have proposed novel techniques that aim to detect fake news by exploring how it propagates on social networks. Nevertheless, to detect fake news at an early stage, i.e., when it is published on a news outlet but not yet spread on social media, one cannot rely on news propagation information as it does not exist. Hence, there is a strong need to develop approaches that can detect fake news by focusing on news content. In this paper, a theory-driven model is proposed for fake news detection. The method investigates news content at various levels: lexicon-level, syntax-level, semantic-level and discourse-level. We represent news at each level, relying on well-established theories in social and forensic psychology. Fake news detection is then conducted within a supervised machine learning framework. As an interdisciplinary research, our work explores potential fake news patterns, enhances the interpretability in fake news feature engineering, and studies the relationships among fake news, deception/disinformation, and clickbaits. Experiments conducted on two real-world datasets indicate the proposed method can outperform the state-of-the-art and enable fake news early detection when there is limited content information.

Overview of "Fake News Early Detection: An Interdisciplinary Study"

The paper "Fake News Early Detection: An Interdisciplinary Study" by Xinyi Zhou, Atishay Jain, Vir V. Phoha, and Reza Zafarani presents a comprehensive approach to detecting fake news by focusing exclusively on the content rather than relying on social media propagation metrics. This addresses the critical challenge of early-stage detection when fake news is freshly published but not yet widely shared.

The authors propose a theory-driven model that integrates insights from social and forensic psychology to enhance feature interpretability and effectiveness in uncovering deception patterns in text. The model deconstructs news content into several linguistic levels: lexicon, syntax, semantics, and discourse, with a particular emphasis on well-established psychological theories such as the Undeutsch hypothesis and information manipulation theory. This interdisciplinary angle aims to make machine learning-based fake news detection more interpretable and grounded in domain-specific theoretical perspectives.

Key Methodological Contributions

  • Linguistic Feature Extraction: The model generates features across different linguistic levels. At the lexicon level, it uses a standardized bag-of-words approach; at the syntax level, both shallow and deep syntactic patterns via part-of-speech tags and rewrite rules are utilized. At the semantic level, it assesses psycho-linguistic attributes such as sentiment and subjectivity inspired by psychological theories related to deception and clickbait characteristics. Finally, discourse-level features are extracted by examining rhetorical relations within the text.
  • Machine Learning Framework: The proposed feature sets are integrated into a supervised machine learning framework to classify news articles as fake or true. The authors explore multiple classifiers, including Logistic Regression, Naïve Bayes, Support Vector Machine, Random Forest, and XGBoost, to ensure robustness and performance efficiency.

Experimental Findings

The paper evaluates its approach using two real-world datasets—PolitiFact and BuzzFeed news articles—and compares its performance against contemporary state-of-the-art detection models. Notably, the proposed model shows superior accuracy and F1 scores, achieving around 88% and suggesting its effectiveness in detecting fake news based solely on content. These results demonstrate that the model can reliably predict fake news with a minimal amount of available information, highlighting its potential for early intervention against the proliferation of misinformation.

Implications and Future Directions

  • Practical Applications: The model's ability to function effectively without propagation information is particularly beneficial for media outlets and fact-checkers wishing to mitigate the spread of misinformation before it escalates on social platforms. It also provides valuable insights into enhancing the transparency and accountability of automated content verification tools.
  • Theoretical Insights: By grounding news detection capabilities in psychological theories, the paper opens pathways for integrating human cognitive patterns with algorithmic efficiency, bridging a gap between social sciences and computational approaches in misinformation studies.
  • Future Prospects: The work suggests potential extensions in incorporating multi-modal data such as imagery and expanding the model across diverse languages and cultural contexts. Additionally, further exploration of the interplay between clickbait and fake news content may provide deeper insights into crafting more sophisticated detection systems.

This research underscores the necessity of combining theoretical and empirical analyses to develop explainable AI systems for automation in news credibility assessment, contributing significantly to the body of work on computational journalism and misinformation resilience strategies.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Xinyi Zhou (33 papers)
  2. Atishay Jain (8 papers)
  3. Vir V. Phoha (9 papers)
  4. Reza Zafarani (18 papers)
Citations (190)