Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models (2305.08283v3)

Published 15 May 2023 in cs.CL

Abstract: LLMs (LMs) are pretrained on diverse data sources, including news, discussion forums, books, and online encyclopedias. A significant portion of this data includes opinions and perspectives which, on one hand, celebrate democracy and diversity of ideas, and on the other hand are inherently socially biased. Our work develops new methods to (1) measure political biases in LMs trained on such corpora, along social and economic axes, and (2) measure the fairness of downstream NLP models trained on top of politically biased LMs. We focus on hate speech and misinformation detection, aiming to empirically quantify the effects of political (social, economic) biases in pretraining data on the fairness of high-stakes social-oriented tasks. Our findings reveal that pretrained LMs do have political leanings that reinforce the polarization present in pretraining corpora, propagating social biases into hate speech predictions and misinformation detectors. We discuss the implications of our findings for NLP research and propose future directions to mitigate unfairness.

Analyzing Political Bias in LLMs: From Pretraining to Downstream Tasks

The paper "From Pretraining Data to LLMs to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models" explores the domain of NLP with a particular focus on LLMs (LMs) and their political biases. This work, authored by Shangbin Feng, Chan Young Park, Yuhan Liu, and Yulia Tsvetkov, explores the potential political biases embedded in LLMs, originating from the pretraining data and propagated through to downstream tasks such as hate speech and misinformation detection.

Key Contributions

This paper makes several novel contributions to the field of NLP and LLM analysis:

  1. Political Bias Measurement: The paper introduces a methodology to measure political biases in LMs. This is achieved by evaluating LLMs along two critical axes defined in political theory: economic values ranging from left to right and social values ranging from authoritarian to libertarian.
  2. Impact of Pretraining Data: The authors dissect the origins of these biases, showing how pretrained LMs are influenced by partisan data inputs, revealing shifts in political leanings upon pretraining with left or right-leaning corpora.
  3. Effect on Downstream Tasks: Their research exhibits how inherent political biases in LMs manifest in downstream models, affecting fairness in high-stakes tasks like hate speech and misinformation detection. This involves using politically biased LMs as classifiers and observing varied model performances.

Methodology and Findings

To measure political bias, the paper employs a probing method inspired by political science's two-dimensional spectrum, using mask in-filling and stance detection to evaluate responses to political statements. Results demonstrated a clear differentiation in leanings among LMs, with encoder models like BERT exhibiting more conservative leanings compared to more liberal generation models like GPT.

The research investigates further pretraining on data from partisan sources such as news articles and social media subreddits. This manipulation highlights the consequential shifts in the political leanings of LMs, where left-leaning data induced a corresponding shift leftward on the political spectrum and vice versa for right-leaning data. Interestingly, the authors note that social media corpora tend to more strongly affect models' social axis, suggesting differing impacts based on data type.

In downstream tasks, particularly through the lens of hate speech and misinformation detection, models pretrained on left-leaning data showed better performance on identifying hate speech against minority groups, whereas right-leaning models were more effective with content targeting majority groups.

Implications and Future Directions

This work underscores the implications of political bias in LMs, especially in applications with social impact, urging the community towards considering bias mitigation and fairness enhancement strategies. The authors suggest the use of partisan ensembles—leveraging multiple LMs of differing biases—to counteract individual model biases, though this requires careful calibration and human oversight.

Additionally, the findings imply that while standard pretraining processes are susceptible to biases, they also open avenues for developing task-specific models by intentionally pretraining with politically representative data. However, this approach also demands caution to avoid exacerbating biases further.

Conclusion

The paper offers valuable insights into the repercussions of political bias inherent in LMs and presents possibilities in mitigating its effects. By analyzing the political leanings of LMs throughout their developmental lifecycle—from data to downstream tasks—the authors propose practical solutions while emphasizing the complex interaction between model architecture, pretraining data, and task-specific behavior. This research not only adds depth to our understanding of model biases but also highlights pathways for developing fairer NLP systems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (129)
  1. Alan Abramowitz and Jennifer McCoy. 2019. United states: Racial resentment, negative partisanship, and polarization in trump’s america. The ANNALS of the American Academy of Political and Social Science, 681(1):137–156.
  2. Modeling annotator perspective and polarized opinions to improve hate speech detection. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, volume 8, pages 151–154.
  3. Jacob Amedie. 2015. The impact of social media on society.
  4. Modelling cultural and socio-economic dimensions of political bias in German tweets. In Proceedings of the 18th Conference on Natural Language Processing (KONVENS 2022), pages 29–40, Potsdam, Germany. KONVENS 2022 Organizers.
  5. Out of one, many: Using language models to simulate human samples. ArXiv, abs/2209.06899.
  6. Eugene Bagdasaryan and Vitaly Shmatikov. 2022. Spinning language models: Risks of propaganda-as-a-service and countermeasures. In 2022 IEEE Symposium on Security and Privacy (SP), pages 1532–1532. IEEE Computer Society.
  7. Pieter Ballon. 2014. Old and new issues in media economics. In The Palgrave handbook of European media policy, pages 70–95. Springer.
  8. Assessing political prudence of open-domain chatbots. In Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 548–555, Singapore and Online. Association for Computational Linguistics.
  9. Tweeteval: Unified benchmark and comparative evaluation for tweet classification. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1644–1650.
  10. John A Bargh. 1999. The cognitive monster: The case against the controllability of automatic stereotype effects.
  11. The pushshift reddit dataset. In Proceedings of the international AAAI conference on web and social media, volume 14, pages 830–839.
  12. Duncan Bell. 2014. What is liberalism? Political theory, 42(6):682–715.
  13. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pages 610–623.
  14. Natural language processing with Python: analyzing text with the natural language toolkit. O’Reilly Media, Inc.
  15. Irene V Blair. 2002. The malleability of automatic stereotypes and prejudice. Personality and social psychology review, 6(3):242–261.
  16. Charles Blattberg. 2001. Political philosophies and political ideologies. Public Affairs Quarterly, 15(3):193–217.
  17. Language (technology) is power: A critical survey of “bias” in nlp. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5454–5476.
  18. Language (technology) is power: A critical survey of “bias” in NLP. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5454–5476, Online. Association for Computational Linguistics.
  19. Norberto Bobbio. 1996. Left and right: The significance of a political distinction. University of Chicago Press.
  20. Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In Advances in neural information processing systems, pages 4349–4357.
  21. Nuanced metrics for measuring unintended bias with real data for text classification. In Companion proceedings of the 2019 world wide web conference, pages 491–500.
  22. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  23. Philip Bump. 2016. How likely are bernie sanders supporters to actually vote for donald trump? here are some clues. Washingtonpost. com.
  24. Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on fairness, accountability and transparency, pages 77–91. PMLR.
  25. Coverage of emerging technologies: A comparison between print and online media. New media & society, 14(6):1039–1059.
  26. Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334):183–186.
  27. On the intrinsic and extrinsic fairness evaluation metrics for contextualized language representations. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 561–570.
  28. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374.
  29. Echo chamber or public sphere? predicting political orientation and measuring political homophily in twitter using big data. Journal of communication, 64(2):317–332.
  30. Michael C Corballis and Ivan L Beale. 2020. The psychology of left and right. Routledge.
  31. Social and economic ideologies differentially predict prejudice across the political spectrum, but social issues are most divisive. Journal of personality and social psychology, 112(3):383.
  32. Dealing with disagreements: Looking beyond the majority vote in subjective annotations. Transactions of the Association for Computational Linguistics, 10:92–110.
  33. Analyzing polarization in social media: Method and application to tweets on 21 mass shootings. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2970–3005.
  34. Patricia G Devine. 1989. Stereotypes and prejudice: Their automatic and controlled components. Journal of personality and social psychology, 56(1):5.
  35. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186.
  36. Stanley Diamond and Eric Wolf. 2017. In search of the primitive: A critique of civilization. Routledge.
  37. Measuring and mitigating unintended bias in text classification. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, pages 67–73.
  38. Documenting large webtext corpora: A case study on the colossal clean crawled corpus. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 1286–1305, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  39. Maeve Duggan. 2017. Online harassment 2017.
  40. Moral stories: Situated reasoning about norms, intents, actions, and their consequences. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 698–718.
  41. Political effects of the internet and social media. Political Behavior: Cognition.
  42. Hans Jurgen Eysenck. 1957. Sense and nonsense in psychology.
  43. William Falcon and The PyTorch Lightning team. 2019. PyTorch Lightning.
  44. Kgap: Knowledge graph augmented political perspective detection in news media. arXiv preprint arXiv:2108.03861.
  45. PAR: Political actor representation learning with social context and expert knowledge. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing.
  46. A survey of race, racism, and anti-racism in nlp. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers).
  47. Datavoidant: An ai system for addressing political data voids on social media. Proceedings of the ACM on Human-Computer Interaction, 6(CSCW2):1–29.
  48. Daniel J Galvin. 2020. Party domination and base mobilization: Donald trump and republican party building in a polarized era. In The Forum, volume 18, pages 135–168. De Gruyter.
  49. Political discourse on social media: Echo chambers, gatekeepers, and the price of bipartisanship. In Proceedings of the 2018 world wide web conference, pages 913–922.
  50. Are we modeling the task or the annotator? an investigation of annotator bias in natural language understanding datasets. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 1161–1166, Hong Kong, China. Association for Computational Linguistics.
  51. Detecting cross-geographic biases in toxicity modeling on social media. In Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021), pages 313–328.
  52. Allen Gindler. 2021. The theory of the political spectrum. Journal of Libertarian Studies, 24(2):24375.
  53. Intrinsic bias metrics do not correlate with application bias. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1926–1940.
  54. Hila Gonen and Kellie Webster. 2020. Automatically identifying gender issues in machine translation using perturbations. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1991–1995, Online. Association for Computational Linguistics.
  55. The dynamics of issue frame competition in traditional and social media. The ANNALS of the American Academy of Political and Social Science, 659(1):207–224.
  56. Don’t stop pretraining: Adapt language models to domains and tasks. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8342–8360.
  57. Equality of opportunity in supervised learning. Advances in neural information processing systems, 29.
  58. Exploring the role of grammar and word choice in bias toward african american english (aae) in hate speech classification. In 2022 ACM Conference on Fairness, Accountability, and Transparency, pages 789–798.
  59. Array programming with numpy. Nature, 585(7825):357–362.
  60. Alfred Hermida. 2016. Social media and the news. The SAGE handbook of digital journalism, pages 81–94.
  61. Share, like, recommend: Decoding the social media news consumer. Journalism studies, 13(5-6):815–824.
  62. David G Horrell. 2005. Paul among liberals and communitarians: models for christian ethics. Pacifica, 18(1):33–52.
  63. Michael Hout and Christopher Maggio. 2021. Immigration, race & political polarization. Daedalus, 150(2):40–55.
  64. Dirk Hovy and Anders Søgaard. 2015. Tagging performance correlates with author age. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 483–488, Beijing, China. Association for Computational Linguistics.
  65. Kenneth Hudson. 1978. The language of modern politics. Springer.
  66. Ben Hutchinson and Margaret Mitchell. 2019. 50 years of test (un) fairness: Lessons for machine learning. In Proceedings of the conference on fairness, accountability, and transparency, pages 49–58.
  67. CommunityLM: Probing partisan worldviews from language models. In Proceedings of the 29th International Conference on Computational Linguistics, pages 6818–6826, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
  68. On transferability of bias mitigation effects in language model fine-tuning. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3770–3783.
  69. Christopher D Johnston and Julie Wronski. 2015. Personality dispositions and political preferences across hard and easy issues. Political Psychology, 36(1):35–53.
  70. Kenneth Joseph and Jonathan M. Morgan. 2020. When do word embeddings accurately reflect surveys on our beliefs about people? In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.
  71. Learning the difference that makes a difference with counterfactually-augmented data. In International Conference on Learning Representations.
  72. Language generation models can cause harm: So what can we do about it? an actionable survey. arXiv preprint arXiv:2210.07700.
  73. News sharing in social media: A review of current research on news sharing users, content, and networks. Social media+ society, 1(2):2056305115610141.
  74. Measuring bias in contextualized word representations. In Proceedings of the First Workshop on Gender Bias in Natural Language Processing.
  75. Albert: A lite bert for self-supervised learning of language representations. In International Conference on Learning Representations.
  76. Anti-Defamation League. 2019. Online hate and harassment: The American experience.
  77. Anti-Defamation League. 2021. The dangers of disinformation.
  78. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Annual Meeting of the Association for Computational Linguistics.
  79. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880.
  80. Chang Li and Dan Goldwasser. 2019. Encoding social information with graph convolutional networks forPolitical perspective detection in news media. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2594–2604, Florence, Italy. Association for Computational Linguistics.
  81. Contextualized perturbation for textual adversarial attack. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5053–5069, Online. Association for Computational Linguistics.
  82. UNQOVERing stereotyping biases via underspecified questions. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3475–3489, Online. Association for Computational Linguistics.
  83. Herb: Measuring hierarchical regional bias in pre-trained language models. In Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022, pages 334–346.
  84. Gendered mental health stigma in masked language models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing.
  85. Does gender matter? towards fairness in dialogue systems. In Proceedings of the 28th International Conference on Computational Linguistics, pages 4403–4416, Barcelona, Spain (Online). International Committee on Computational Linguistics.
  86. Mitigating political bias in language models through reinforced calibration. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 14857–14866.
  87. Roberta: A robustly optimized bert pretraining approach. ArXiv, abs/1907.11692.
  88. POLITICS: Pretraining with same-story article comparison for ideology prediction and stance detection. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 1354–1374, Seattle, United States. Association for Computational Linguistics.
  89. Peter Mair. 2007. Left–right orientations.
  90. Aibek Makazhanov and Davood Rafiei. 2013. Predicting political preference of twitter users. In Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pages 298–305.
  91. Hatexplain: A benchmark dataset for explainable hate speech detection. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 14867–14875.
  92. Brian Patrick Mitchell. 2007. Eight ways to run the country: A new and revealing look at left and right. Greenwood Publishing Group.
  93. Eni Mustafaraj and Panagiotis Takis Metaxas. 2011. What edited retweets reveal about online political discourse. In Workshops at the Twenty-Fifth AAAI Conference on Artificial Intelligence.
  94. StereoSet: Measuring stereotypical bias in pretrained language models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 5356–5371, Online. Association for Computational Linguistics.
  95. Crows-pairs: A challenge dataset for measuring social biases in masked language models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1953–1967.
  96. OpenAI. 2023. Gpt-4 technical report. ArXiv, abs/2303.08774.
  97. Reducing gender bias in abusive language detection. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2799–2804, Brussels, Belgium. Association for Computational Linguistics.
  98. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32.
  99. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830.
  100. Language models as knowledge bases? In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2463–2473.
  101. Beyond binary labels: Political ideology prediction of Twitter users. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 729–740, Vancouver, Canada. Association for Computational Linguistics.
  102. Language models are unsupervised multitask learners.
  103. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140):1–67.
  104. Social media and political engagement. Pew Internet & American Life Project, 19(1):2–13.
  105. Measuring alignment of online grassroots political communities with political campaigns. In Proceedings of the International AAAI Conference on Web and Social Media, volume 16, pages 806–816.
  106. Milton Rokeach. 1973. The nature of human values. Free press.
  107. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.
  108. The risk of racial bias in hate speech detection. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.
  109. Annotators with attitudes: How annotator beliefs and identities bias toxic language detection. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
  110. On second thought, let’s not think step by step! bias and toxicity in zero-shot reasoning. arXiv preprint arXiv:2212.08061.
  111. Qinlan Shen and Carolyn Rose. 2021. What sounds “right” to me? experiential factors in the perception of political ideology. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume.
  112. Upstream Mitigation Is Not All You Need: Testing the Bias Transfer Hypothesis in Pre-Trained Language Models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers).
  113. Mitigating gender bias in natural language processing: Literature review. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.
  114. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca.
  115. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  116. Megan Trudell. 2016. Sanders, trump and the us working class. International Socialism.
  117. Tom Utley. 2001. I’m v. right-wing, says the bbc, but it’s not that simple.
  118. Social networks that matter: Exploring the role of political discussion for online political participation. International Journal of Public Opinion Research, 24:163–184.
  119. Alcides Velasquez. 2012. Social media and online political discussion: The effect of cues and informational cascades on participation in online political communities. New Media & Society, 14(8):1286–1303.
  120. Ben Wang and Aran Komatsuzaki. 2021. GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax.
  121. Adversarial glue: A multi-task benchmark for robustness evaluation of language models. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2).
  122. William Yang Wang. 2017. “liar, liar pants on fire”: A new benchmark dataset for fake news detection. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers).
  123. A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers).
  124. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, pages 38–45.
  125. How hate speech varies by target identity: A computational analysis. In Proceedings of the 26th Conference on Computational Natural Language Learning (CoNLL).
  126. Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. In International Conference on Machine Learning, pages 11328–11339. PMLR.
  127. KCD: Knowledge walks and textual cues enhanced political perspective detection in news media. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
  128. Gender bias in coreference resolution: Evaluation and debiasing methods. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers).
  129. Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. 2015 IEEE International Conference on Computer Vision (ICCV), pages 19–27.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Shangbin Feng (53 papers)
  2. Chan Young Park (20 papers)
  3. Yuhan Liu (103 papers)
  4. Yulia Tsvetkov (142 papers)
Citations (187)
Youtube Logo Streamline Icon: https://streamlinehq.com