Quantifying Search Bias: Investigating Sources of Bias for Political Searches in Social Media (1704.01347v1)

Published 5 Apr 2017 in cs.SI, cs.CY, and cs.HC

Abstract: Search systems in online social media sites are frequently used to find information about ongoing events and people. For topics with multiple competing perspectives, such as political events or political candidates, bias in the top ranked results significantly shapes public opinion. However, bias does not emerge from an algorithm alone. It is important to distinguish between the bias that arises from the data that serves as the input to the ranking system and the bias that arises from the ranking system itself. In this paper, we propose a framework to quantify these distinct biases and apply this framework to politics-related queries on Twitter. We found that both the input data and the ranking system contribute significantly to produce varying amounts of bias in the search results and in different ways. We discuss the consequences of these biases and possible mechanisms to signal this bias in social media search systems' interfaces.

Authors (7)

Juhi Kulshrestha (14 papers)
Motahhare Eslami (27 papers)
Johnnatan Messias (20 papers)
Muhammad Bilal Zafar (27 papers)
Saptarshi Ghosh (82 papers)
Krishna P. Gummadi (68 papers)
Karrie Karahalios (16 papers)

Citations (213)

View on Semantic Scholar

Summary

Analyzing and Quantifying Search Bias in Social Media Political Searches

The paper "Quantifying Search Bias: Investigating Sources of Bias for Political Searches in Social Media" presents a novel framework to measure and characterize biases in search engine results, specifically focusing on political queries in social media platforms such as Twitter. The authors elucidate the distinction between biased input data and algorithmic biases introduced by the ranking mechanisms of search systems, applying their framework to Twitter queries during the politically charged context of the 2016 U.S. Presidential primaries.

Methodology for Quantifying Bias

The framework allows for the quantification of search bias by distinguishing between three different components: the input bias originating from the input data sets relevant to a query, the output bias reflecting the final ranked results presented to the user, and the ranking bias that captures additional bias introduced by the ranking algorithms.

To measure the political bias in the context of U.S. politics, where there are mainly two perspectives, the framework requires an understanding of how individual data items—tweets in this case—align with Democratic or Republican viewpoints. The methodology involves analyzing both the source bias (the bias of the user who posted a tweet) and the content bias (the tweet content itself).

Evaluation and Key Findings

Upon applying this framework to political searches on Twitter, several critical insights emerged:

Integrated Biases: The paper recognizes the significance of both input data and ranking systems in shaping the final bias observed in search results. It highlights that while data inherently carries biases, the contribution of ranking algorithms in shifting or even reversing these inherent biases is substantial.
Ranking Dynamics: By examining specific case studies, such as the search results for popular political figures like Hillary Clinton and Donald Trump, the authors observed that the ranking system could either mitigate or amplify the input bias based on the candidate's popularity and public perception.
Variability in Query Wording: Interestingly, slight variations in query phraseology, such as “republican debate” versus “rep debate,” were shown to yield significantly different bias results, underscoring the complexity of search systems in processing natural language inputs and their consequent effect on political sameness and distinctiveness in the output retrieval.

Implications and Future Directions

The paper's findings suggest multiple implications for the design and auditing of search systems:

Transparency and Awareness: There's a need for mechanisms in these systems to make end-users aware of potential biases in the results, potentially through explicit signaling or transparency in how results are curated and ranked.
Design Modifications: The insight into input versus system-induced biases calls for a refined balance in search algorithms between relevance, popularity, and fairness metrics.
Research Extensions: The framework provides a foundation for expanding the paper of biases to other domains and multi-perspective scenarios, including non-binary political environments or other contentious societal subjects.

While the paper offers a groundbreaking approach to understanding biases in social media search results, it also opens new avenues for further research. Future work could include user-centric studies to assess how bias transparency impacts user trust and decision-making or developing adaptive algorithms that dynamically adjust to minimize unfavorable biases in real-time.

Overall, the paper serves as a crucial step towards more ethically aligned, fair search systems in the digital age, especially within politically sensitive contexts. The authors' contribution highlights both the challenges and opportunities in rectifying biases in algorithmically mediated environments.