- The paper presents a novel distinction that highlights informational bias as a subtle, context-driven phenomenon embedded within news content.
- It leverages the BASIL dataset of 300 news articles organized in triplets from diverse media outlets to perform a comparative bias analysis.
- Baseline BERT models reveal challenges in detecting contextual bias, emphasizing the need for improved machine learning approaches.
The paper "In Plain Sight: Media Bias Through the Lens of Factual Reporting" investigates the dimensions of media bias with a specific focus on informational bias within news reporting. Leveraging a newly developed dataset, BASIL, the authors aim to identify and categorically describe how bias manifests in media, especially concerning its informational content. This work advances the field by challenging traditional notions of bias that heavily emphasize lexical attributes and instead proposes a nuanced view that accounts for the influence of selected content.
Distinction and Prevalence
The authors argue that while lexical bias—bias characterized by word choice and syntax—is significant, informational bias, which is embedded at the content level in terms of how events and entities are framed, often exerts a greater influence on reader perception. Not only is informational bias more prevalent in media text than its lexical counterpart, but it also tends to be more insidious since it requires a contextual understanding of the text.

Figure 1: Distribution of lexical and informational bias spans found in each quartile of an article. The shaded area represents the 95\% confidence interval for the three outlets combined.
Characteristics and Detection
Through their analysis of BASIL, the authors found that informational bias is evenly distributed across articles. In contrast, lexical bias is concentrated primarily at the beginning of articles. This suggests that news outlets may employ lexical choices to capture initial reader attention, whereas informational bias is more subtly embedded throughout the narrative structure.
Dataset and Annotation Process
Data Collection and Triplet System
The BASIL dataset comprises 300 news articles aligned in triplets, each covering the same event from three distinct media sources with varying ideological leanings: Fox News, the New York Times, and Huffington Post. This triplet system allows for a comparative analysis of how different outlets frame the same story.


Figure 2: Our Javascript annotation tool at various steps.
Annotation Methodology
The dataset is meticulously annotated to capture both informational and lexical bias. The annotation process leverages a detailed schema to classify spans of text based on their bias type, target entity, and polarity. The complex interplay between different types of bias requires a sophisticated understanding of context, which informs both the dataset's structure and its utility in machine learning applications.
Bias Detection Challenges
Machine Learning Approaches
The paper presents baseline models for bias detection using fine-tuned BERT architectures. These models reveal significant challenges in detecting informational bias, which intrinsically relies on contextual and sometimes implicit cues that go beyond surface-level lexical features.
Figure 3: Percentage of bias spans with negative polarity toward targets of known ideology, grouped by media source, bias type, and target's ideology.
Comparative Analysis and Future Directions
The results from the BERT models highlight the need for incorporating broader contextual knowledge, potentially from entire articles or aligned content across different sources, to effectively identify informational bias. Future work could explore methodologies to improve the contextual understanding of models by leveraging discourse structures and external knowledge bases.
Implications and Future Research
The insights drawn from this paper have dual implications. From a practical perspective, they underscore the importance of refining media literacy to recognize bias that is deeply woven into factual content. Theoretically, this study expands the understanding of media bias, prompting further exploration into the complex mechanisms of content framing and its psycho-social impact on audiences.
In conclusion, this work delineates a clear pathway for future research on media bias, emphasizing the necessity to develop robust detection systems that can parse the intricate layers of informational bias. As the field progresses, such research will be pivotal in fostering a more critically engaged public discourse around media content.