The Political Preferences of LLMs (2402.01789v2)

Published 2 Feb 2024 in cs.CY, cs.AI, and cs.CL

Abstract: I report here a comprehensive analysis about the political preferences embedded in LLMs. Namely, I administer 11 political orientation tests, designed to identify the political preferences of the test taker, to 24 state-of-the-art conversational LLMs, both closed and open source. When probed with questions/statements with political connotations, most conversational LLMs tend to generate responses that are diagnosed by most political test instruments as manifesting preferences for left-of-center viewpoints. This does not appear to be the case for five additional base (i.e. foundation) models upon which LLMs optimized for conversation with humans are built. However, the weak performance of the base models at coherently answering the tests' questions makes this subset of results inconclusive. Finally, I demonstrate that LLMs can be steered towards specific locations in the political spectrum through Supervised Fine-Tuning (SFT) with only modest amounts of politically aligned data, suggesting SFT's potential to embed political orientation in LLMs. With LLMs beginning to partially displace traditional information sources like search engines and Wikipedia, the societal implications of political biases embedded in LLMs are substantial.

PDF Abstract

This paper investigates the political preferences embedded in LLMs by administering a battery of 11 political orientation tests to 24 state-of-the-art conversational LLMs and 5 base (foundation) models (Rozado, 2 Feb 2024 ). The paper aims to characterize the political leanings manifested in LLM responses to politically charged questions/statements and explore when these preferences emerge during the model training pipeline.

Methodology & Implementation Details:

Test Administration: The researchers used 11 diverse political orientation tests (e.g., Political Compass Test, Political Spectrum Quiz, iSideWith, 8 Values Test).
Models Tested: 24 conversational LLMs (e.g., GPT-3.5/4, Gemini, Claude, Llama 2-chat, Mistral-instruct, Qwen-chat, Grok) and 5 base models (GPT-3 series, Llama 2 series) were evaluated.
Prompting Technique: To elicit answers conforming to the test formats, especially from base models not optimized for instruction following, each test item was wrapped in a prompt structure:
1
[Random Prefix] + [Test Question/Statement] + [Allowed Answers] + [Random Suffix]
Prefixes (e.g., "Give me a sense of your preferences...") and suffixes (e.g., "Make sure you answer with one of the options above.") were randomly selected from predefined lists to encourage the model to choose one of the allowed answers. Each test item was presented in isolation (cleared chat history).
Automated Stance Detection: OpenAI's gpt-3.5-turbo was used to parse the LLM responses and map them to one of the allowed test answers, handling both direct choices and longer-form responses expressing a similar sentiment. This addresses the need for consistent interpretation, especially when models don't strictly adhere to the requested format.
Handling Invalid Responses: If a model refused to answer or provided an irrelevant response, the query was retried up to 10 times. If still unsuccessful, the item was marked blank or assigned a random answer (only for tests requiring complete answers, occurring in <0.2% of cases). Base models had significantly higher invalid response rates (avg. 42%) compared to conversational models (avg. 11%).
Fine-tuning Experiment: To test the hypothesis that fine-tuning influences political alignment, the researchers fine-tuned gpt-3.5-turbo using the OpenAI API (2 epochs) with politically curated datasets:
- LeftWingGPT: Tuned on ~7.6M tokens from left-leaning publications, books, and synthetic data.
- RightWingGPT: Tuned on ~6.8M tokens from right-leaning sources and synthetic data.
- DepolarizingGPT: Tuned on ~17M tokens aiming for political moderation (Institute for Cultural Evolution, synthetic data).

Key Findings:

Conversational LLMs: Across the majority of the 11 tests, most conversational LLMs (developed by various organizations) generated responses consistently classified as left-of-center. This finding was statistically significant across multiple political dimensions (e.g., economic left, socially liberal/libertarian, culturally liberal). The homogeneity across diverse models was noted as remarkable.
Base LLMs: In contrast, base models (without SFT/RL) produced responses that clustered around the political center, often statistically indistinguishable from random chance. However, the authors caution against over-interpreting this due to the high rate of invalid/incoherent responses from base models, even with prompting strategies.
Fine-tuning Impact: The fine-tuning experiment successfully steered the models towards the targeted political orientations. LeftWingGPT scored significantly left, RightWingGPT scored right, and DepolarizingGPT scored closer to the center on the political tests, demonstrating that SFT with relatively modest compute and data (~7-17M tokens) can effectively instill specific political leanings.

Conclusions and Practical Implications:

Source of Bias: The results strongly suggest that the observed left-leaning preferences in conversational LLMs are primarily introduced or amplified after the pretraining phase, likely during Supervised Fine-Tuning (SFT) and/or Reinforcement Learning (RL) stages. The sheer diversity of the pretraining corpus might allow base models to map the political landscape without strong inherent biases emerging immediately, even if the data isn't perfectly balanced.
Steerability via SFT: The fine-tuning experiment provides practical evidence that SFT is a powerful mechanism for shaping an LLM's political stance. This implies that developers can, intentionally or unintentionally, embed political viewpoints during the alignment process. The paper demonstrates a concrete method for achieving this using targeted datasets (articles, books, synthetic generation).
Automated Bias Assessment: The methodology employed (automated test administration, prefix/suffix prompting, automated stance detection) offers a scalable framework for auditing the political inclinations of LLMs. Developers can adapt this approach to evaluate their own models.
Societal Impact: Given that LLMs are increasingly used as information sources, the embedded political preferences (regardless of their origin – explicit tuning, byproduct of safety training, or propagation via synthetic data) have significant societal implications for shaping public opinion and discourse. The authors stress the need for critical examination and awareness of these biases.
Test Instrument Variability: The Nolan Test yielded results outlier to the other 10 tests, diagnosing most LLMs as moderate. This highlights potential variability and validity issues among different political orientation tests themselves.