Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Predicting Factuality of Reporting and Bias of News Media Sources (1810.01765v1)

Published 2 Oct 2018 in cs.IR, cs.LG, and stat.ML

Abstract: We present a study on predicting the factuality of reporting and bias of news media. While previous work has focused on studying the veracity of claims or documents, here we are interested in characterizing entire news media. These are under-studied but arguably important research problems, both in their own right and as a prior for fact-checking systems. We experiment with a large list of news websites and with a rich set of features derived from (i) a sample of articles from the target news medium, (ii) its Wikipedia page, (iii) its Twitter account, (iv) the structure of its URL, and (v) information about the Web traffic it attracts. The experimental results show sizable performance gains over the baselines, and confirm the importance of each feature type.

Overview of "Predicting Factuality of Reporting and Bias of News Media Sources"

This paper addresses the automated prediction of factuality and bias in news media sources, a significant problem in today's digital information landscape. The key contribution of this research is a predictive model that estimates the factuality of reporting and identifies the bias (political orientation) of news media. Traditionally, research has concentrated on debunking misinformation at the claim or article level. This paper, however, innovatively shifts the focus to an entire news media source—an under-studied but critical facet of misinformation research.

Methodology and Data

The authors employ a diverse set of features derived from several sources: articles from the news medium, the medium's Wikipedia page, its Twitter account, the structure of its URL, and web traffic information. This multi-faceted approach aims to capture various dimensions of a news source's reliability. The dataset created for this paper consists of over 1,000 news media sources, annotated manually for both factuality and bias, making it substantially larger than datasets used in previous work.

For factuality, a 3-point scale is used (Low, Mixed, High), while bias is measured on a 7-point ordinal scale ranging from Extreme-Left to Extreme-Right. The classifiers, trained using SVMs, focused on optimizing macro-averaged F1 score, with experimental results showing significant gains over baseline models.

Findings

The experimental results underscore that article content features, including linguistic attributes, sentiment, topic-driven features, and complexity indicators, are crucial for estimating factuality. Wikipedia and Twitter also contribute meaningfully, although their influence varies between tasks. URLs and web traffic metrics, while not as impactful individually, provide additional context, especially when integrated into a holistic model.

The paper's ablation paper reveals that article features are most critical, yet the combination of diverse feature types enhances performance, highlighting the complexity of assessing media reliability. Importantly, this integrated model identifies biases and assesses factuality with a higher degree of accuracy and lower error rates than previously achieved.

Implications and Future Directions

The implications of this work are twofold: first, it provides a foundation for automatic systems that could assist fact-checkers by flagging dubious sources, potentially streamlining the fact-checking process. Second, it offers insights for researchers focusing on media studies and political communication by quantifying media bias and reliability.

Looking forward, there are several paths for future exploration. Addressing the task as ordinal regression could enhance prediction accuracy, particularly in capturing subtle gradations of bias and factuality. Furthermore, there is potential for expanding the model to account for different types of media bias beyond traditional left-right dichotomies, thus allowing for a more global application. Integrating multilingual recognition could also be instrumental in understanding media biases in non-Western contexts, providing a more universal tool in combating misinformation.

Overall, this research enriches the discourse on media reliability, offering robust methods and comprehensive datasets that pave the way for advanced AI-driven approaches to curbing misinformation.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Ramy Baly (10 papers)
  2. Georgi Karadzhov (20 papers)
  3. Dimitar Alexandrov (3 papers)
  4. James Glass (173 papers)
  5. Preslav Nakov (253 papers)
Citations (226)