Is GPT-4 Less Politically Biased than GPT-3.5? A Renewed Investigation of ChatGPT's Political Biases (2410.21008v1)

Published 28 Oct 2024 in cs.CL

Abstract: This work investigates the political biases and personality traits of ChatGPT, specifically comparing GPT-3.5 to GPT-4. In addition, the ability of the models to emulate political viewpoints (e.g., liberal or conservative positions) is analyzed. The Political Compass Test and the Big Five Personality Test were employed 100 times for each scenario, providing statistically significant results and an insight into the results correlations. The responses were analyzed by computing averages, standard deviations, and performing significance tests to investigate differences between GPT-3.5 and GPT-4. Correlations were found for traits that have been shown to be interdependent in human studies. Both models showed a progressive and libertarian political bias, with GPT-4's biases being slightly, but negligibly, less pronounced. Specifically, on the Political Compass, GPT-3.5 scored -6.59 on the economic axis and -6.07 on the social axis, whereas GPT-4 scored -5.40 and -4.73. In contrast to GPT-3.5, GPT-4 showed a remarkable capacity to emulate assigned political viewpoints, accurately reflecting the assigned quadrant (libertarian-left, libertarian-right, authoritarian-left, authoritarian-right) in all four tested instances. On the Big Five Personality Test, GPT-3.5 showed highly pronounced Openness and Agreeableness traits (O: 85.9%, A: 84.6%). Such pronounced traits correlate with libertarian views in human studies. While GPT-4 overall exhibited less pronounced Big Five personality traits, it did show a notably higher Neuroticism score. Assigned political orientations influenced Openness, Agreeableness, and Conscientiousness, again reflecting interdependencies observed in human studies. Finally, we observed that test sequencing affected ChatGPT's responses and the observed correlations, indicating a form of contextual memory.

PDF Abstract

Analysis of Political Bias in GPT-3.5 and GPT-4: A Comparative Study

The paper "Is GPT-4 Less Politically Biased than GPT-3.5? A Renewed Investigation of ChatGPT's Political Biases" by Weber et al. presents an empirical investigation into the political biases embedded within two prominent versions of OpenAI's LLMs: GPT-3.5 and GPT-4. Using standardized psychological and political assessments, the authors scrutinize the extent to which these models reflect or deviate from political neutrality and personality traits consistent with human behavioral studies.

Methodological Approach

The paper employs a rigorous quantitative approach, utilizing the Political Compass Test and the Big Five Personality Test administered 100 times across different preset scenarios to evaluate the degree of political bias. These tests captured multidimensional personality traits and political leanings through statistically significant methodologies including average score computation, standard deviation analysis, and significance testing using the Brunner-Munzel non-parametric test.

Key Findings

Political Bias

Both GPT-3.5 and GPT-4 were found to exhibit a libertarian-left bias, with GPT-3.5 scoring $-6.59$ and $-6.07$ on the economic and social axes, respectively, and GPT-4 scoring $-5.40$ and $-4.73$ . Despite OpenAI's attempts to mitigate bias, GPT-4 demonstrated only a marginal reduction in this bias compared to GPT-3.5. An important distinction lies in GPT-4's enhanced adaptability, showing a meaningful capacity to echo assigned political quadrants accurately, a feat at which GPT-3.5 underperformed.

This differential ability of GPT-4 suggests an improved, although not fully neutralized, alignment mechanism with diverse political perspectives. Such resilience might be attributed to refined tuning processes or broader contextual understanding capabilities of GPT-4.

Personality Traits

In terms of personality characteristics, GPT-3.5 exhibited high levels of Openness and Agreeableness, traditionally associated with progressive viewpoints, with scores of $85.9\%$ and $84.6\%$ . Though moderation of these traits was observed in GPT-4 along with an elevated Neuroticism, both models retained the same direction of bias, suggesting that personality simulation within these models still aligns closely with findings in human behavioral research.

Implications

The paper’s findings have several theoretical and practical implications. Theoretically, the presence of consistent biases aligned with specific personality traits accentuates the inherent challenges in engineering fully neutral artificial intelligence models, especially those reliant on probabilistic language generation from extensive and diverse datasets. Practically, the persistent biases, albeit slightly reduced in GPT-4, necessitate caution in deploying these models in sensitive domains like politics or education, where impartiality is critical.

The ability of GPT-4 to emulate specified political stances more accurately than its predecessor suggests potential applications in controlled environments where mimicking diverse perspectives could be beneficial, albeit with mindful oversight to ensure ethical deployment.

Future Directions

The paper opens up potential pathways for future research focusing on comparative evaluations across models from other AI developers, such as Google Gemini or Meta's LLAmA. Such cross-comparisons could reveal generational improvements fairly and delineate collective advancements in bias reduction strategies across the industry's leading AI models. Additionally, further investigations are warranted into understanding the low-level mechanisms driving the models' decision-making processes in emulating complex human traits. Longitudinal examinations revisiting these models as they are updated would provide vital insights into trends in diminishing political and personality biases over time.

Overall, the paper rigorously underscores the nuanced trajectories in mitigating biases in LLMs and presents valuable insights for both AI developers and consumers in their continued efforts toward creating fairer and more reliable systems.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Erik Weber (4 papers)
Jérôme Rutinowski (10 papers)
Niklas Jost (4 papers)
Markus Pauly (78 papers)