In Plain Sight: Media Bias Through the Lens of Factual Reporting

Published 5 Sep 2019 in cs.CL | (1909.02670v1)

Abstract: The increasing prevalence of political bias in news media calls for greater public awareness of it, as well as robust methods for its detection. While prior work in NLP has primarily focused on the lexical bias captured by linguistic attributes such as word choice and syntax, other types of bias stem from the actual content selected for inclusion in the text. In this work, we investigate the effects of informational bias: factual content that can nevertheless be deployed to sway reader opinion. We first produce a new dataset, BASIL, of 300 news articles annotated with 1,727 bias spans and find evidence that informational bias appears in news articles more frequently than lexical bias. We further study our annotations to observe how informational bias surfaces in news articles by different media outlets. Lastly, a baseline model for informational bias prediction is presented by fine-tuning BERT on our labeled data, indicating the challenges of the task and future directions.

Abstract PDF Upgrade to Chat

Citations (100)

View on Semantic Scholar

Summary

The paper presents a novel distinction that highlights informational bias as a subtle, context-driven phenomenon embedded within news content.
It leverages the BASIL dataset of 300 news articles organized in triplets from diverse media outlets to perform a comparative bias analysis.
Baseline BERT models reveal challenges in detecting contextual bias, emphasizing the need for improved machine learning approaches.

Analyzing Media Bias Through Factual Reporting

The paper "In Plain Sight: Media Bias Through the Lens of Factual Reporting" investigates the dimensions of media bias with a specific focus on informational bias within news reporting. Leveraging a newly developed dataset, BASIL, the authors aim to identify and categorically describe how bias manifests in media, especially concerning its informational content. This work advances the field by challenging traditional notions of bias that heavily emphasize lexical attributes and instead proposes a nuanced view that accounts for the influence of selected content.

Informational Bias vs. Lexical Bias

Distinction and Prevalence

The authors argue that while lexical bias—bias characterized by word choice and syntax—is significant, informational bias, which is embedded at the content level in terms of how events and entities are framed, often exerts a greater influence on reader perception. Not only is informational bias more prevalent in media text than its lexical counterpart, but it also tends to be more insidious since it requires a contextual understanding of the text.

Figure 1: Distribution of lexical and informational bias spans found in each quartile of an article. The shaded area represents the 95\% confidence interval for the three outlets combined.

Characteristics and Detection

Through their analysis of BASIL, the authors found that informational bias is evenly distributed across articles. In contrast, lexical bias is concentrated primarily at the beginning of articles. This suggests that news outlets may employ lexical choices to capture initial reader attention, whereas informational bias is more subtly embedded throughout the narrative structure.

Dataset and Annotation Process

Data Collection and Triplet System

The BASIL dataset comprises 300 news articles aligned in triplets, each covering the same event from three distinct media sources with varying ideological leanings: Fox News, the New York Times, and Huffington Post. This triplet system allows for a comparative analysis of how different outlets frame the same story.

Figure 2: Our Javascript annotation tool at various steps.

Annotation Methodology

The dataset is meticulously annotated to capture both informational and lexical bias. The annotation process leverages a detailed schema to classify spans of text based on their bias type, target entity, and polarity. The complex interplay between different types of bias requires a sophisticated understanding of context, which informs both the dataset's structure and its utility in machine learning applications.

Bias Detection Challenges

Machine Learning Approaches

The paper presents baseline models for bias detection using fine-tuned BERT architectures. These models reveal significant challenges in detecting informational bias, which intrinsically relies on contextual and sometimes implicit cues that go beyond surface-level lexical features.

Figure 3: Percentage of bias spans with negative polarity toward targets of known ideology, grouped by media source, bias type, and target's ideology.

Comparative Analysis and Future Directions

The results from the BERT models highlight the need for incorporating broader contextual knowledge, potentially from entire articles or aligned content across different sources, to effectively identify informational bias. Future work could explore methodologies to improve the contextual understanding of models by leveraging discourse structures and external knowledge bases.

Implications and Future Research

The insights drawn from this paper have dual implications. From a practical perspective, they underscore the importance of refining media literacy to recognize bias that is deeply woven into factual content. Theoretically, this study expands the understanding of media bias, prompting further exploration into the complex mechanisms of content framing and its psycho-social impact on audiences.

In conclusion, this work delineates a clear pathway for future research on media bias, emphasizing the necessity to develop robust detection systems that can parse the intricate layers of informational bias. As the field progresses, such research will be pivotal in fostering a more critically engaged public discourse around media content.

Markdown