Consistency of questionnaire-based vs. roll-call-based LLM ideology evaluations over time
Determine whether evaluations of large language model ideological tendencies based on questionnaire-style instruments (e.g., political compass tests and voting advice applications) and evaluations based on alignment with parliamentary roll-call voting records continue to yield the same ideological patterns as model architectures and training pipelines evolve.
References
As LLM architectures and training pipelines evolve, it remains an open question whether questionnaire-based and roll-call-based analyses will continue to yield the same patterns. This underlines the need for empirically grounded benchmarks and systematic evaluation frameworks that allow the field to track how ideological tendencies emerge, persist, or diverge in subsequent generations of LLMs.