Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Stylometric Inquiry into Hyperpartisan and Fake News (1702.05638v1)

Published 18 Feb 2017 in cs.CL

Abstract: This paper reports on a writing style analysis of hyperpartisan (i.e., extremely one-sided) news in connection to fake news. It presents a large corpus of 1,627 articles that were manually fact-checked by professional journalists from BuzzFeed. The articles originated from 9 well-known political publishers, 3 each from the mainstream, the hyperpartisan left-wing, and the hyperpartisan right-wing. In sum, the corpus contains 299 fake news, 97% of which originated from hyperpartisan publishers. We propose and demonstrate a new way of assessing style similarity between text categories via Unmasking---a meta-learning approach originally devised for authorship verification---, revealing that the style of left-wing and right-wing news have a lot more in common than any of the two have with the mainstream. Furthermore, we show that hyperpartisan news can be discriminated well by its style from the mainstream (F1=0.78), as can be satire from both (F1=0.81). Unsurprisingly, style-based fake news detection does not live up to scratch (F1=0.46). Nevertheless, the former results are important to implement pre-screening for fake news detectors.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Martin Potthast (64 papers)
  2. Johannes Kiesel (8 papers)
  3. Kevin Reinartz (1 paper)
  4. Janek Bevendorff (8 papers)
  5. Benno Stein (44 papers)
Citations (586)

Summary

  • The paper introduces a novel application of Unmasking to reveal stylistic similarities between hyperpartisan left- and right-wing news compared to mainstream and satire articles.
  • The paper achieves F1 scores of 0.78 and 0.81 in distinguishing hyperpartisan news from mainstream and satire content, respectively, highlighting its methodological rigor.
  • The paper proposes a style-based pre-screening method that can enhance fact-checking processes by reliably flagging biased or misleading content for further review.

A Stylometric Inquiry into Hyperpartisan and Fake News

The paper investigates the stylistic characteristics of hyperpartisan news in relation to fake news by constructing a comprehensive corpus of 1,627 articles, including a significant portion of fake news, meticulously fact-checked by professional journalists from BuzzFeed. The primary focus is on differentiating between hyperpartisan (extremely one-sided) and mainstream news through stylistic analysis using a meta-learning technique called Unmasking.

Key Findings and Methodology

The paper introduces a novel application of Unmasking to assess stylistic similarities across broader textual categories, specifically examining the stylistic overlap between left-wing and right-wing hyperpartisan news versus mainstream news. Remarkably, the analysis uncovers significant stylistic similarities between hyperpartisan left-wing and right-wing articles, challenging the assumption of distinct stylistic separation based on political orientation.

Quantitatively, the analysis reports that hyperpartisan news can be effectively differentiated from mainstream news (F1=0.78), as well as satire from both categories (F1=0.81). However, the stylometric approach to directly detect fake news shows limited efficacy (F1=0.46), suggesting the need for complementary techniques in fake news detection frameworks.

Utilizing a style-based feature set, including character n-grams, stop words, and readability scores, the researchers highlight the feasibility of using stylistic markers to pre-screen hyperpartisan content. The experiments demonstrate that hyperpartisan style detection offers a practical means for preliminary filtering, paving the way for more granular fact-checking processes.

Implications and Future Work

The results bear noteworthy implications for the development of automated systems aimed at detecting biased and potentially misleading information. By establishing reliable stylistic patterns inherent to hyperpartisan news, automated pre-screening can become an integral component of content verification pipelines in media platforms. This methodology enhances the rapid identification of articles that may warrant further scrutiny based on their stylistic signatures, providing a scalable mechanism to confront the proliferation of misinformation.

Considering future research directions, there exists potential for refining stylistic models to incorporate evolving elements of deceptive content while exploring integrations with semantic web technologies and linked open data for a holistic approach to misinformation containment.

In summary, the paper makes substantial advancements in stylometric analysis of hyperpartisan and fake news, elucidating distinct yet interconnected stylistic traits while offering insights into improved strategies for misinformation management.