Origin of Political Bias in Large Language Models

Determine the origins of the left-leaning political bias observed in large language models when evaluated on the Wahl-O-Mat political statements by identifying and quantifying the contributions of potential sources such as training data bias, representation gaps, model memory effects, and tokenizer-induced skew.

Background

The paper evaluates several open-source LLMs (including Llama 2/3, Mistral, DeepSeek R1, and Simplescaling S1) on the German Wahl-O-Mat statements and finds a consistent left-leaning political bias that increases with model size and varies with language.

While multiple correlates are identified (model size, language, and release date), the causal origin of the bias is not established. The authors suggest plausible factors—including biased training data, representation gaps, memory effects, and tokenization artifacts—but acknowledge uncertainty about their roles.

References

It is not certain where the bias originates, but a reasonable estimate would be an inherent bias in the training data.

Large Means Left: Political Bias in Large Language Models Increases with Their Number of Parameters (2505.04393 - Exler et al., 7 May 2025) in Discussion, paragraph 3