Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models (2402.16786v2)
Abstract: Much recent work seeks to evaluate values and opinions in LLMs using multiple-choice surveys and questionnaires. Most of this work is motivated by concerns around real-world LLM applications. For example, politically-biased LLMs may subtly influence society when they are used by millions of people. Such real-world concerns, however, stand in stark contrast to the artificiality of current evaluations: real users do not typically ask LLMs survey questions. Motivated by this discrepancy, we challenge the prevailing constrained evaluation paradigm for values and opinions in LLMs and explore more realistic unconstrained evaluations. As a case study, we focus on the popular Political Compass Test (PCT). In a systematic review, we find that most prior work using the PCT forces models to comply with the PCT's multiple-choice format. We show that models give substantively different answers when not forced; that answers change depending on how models are forced; and that answers lack paraphrase robustness. Then, we demonstrate that models give different answers yet again in a more realistic open-ended answer setting. We distill these findings into recommendations and open challenges in evaluating values and opinions in LLMs.
- Giuseppe Attanasio. 2023. Simple Generation. https://github.com/MilaNLProc/simple-generation.
- Marcel Binz and Eric Schulz. 2023. Using cognitive psychology to understand gpt-3. Proceedings of the National Academy of Sciences, 120(6):e2218523120.
- Studying framing effects on political preferences. Doing news framing analysis II: Empirical and theoretical perspectives, pages 27–50.
- Dennis Chong and James N Druckman. 2007. Framing theory. Annu. Rev. Polit. Sci., 10:103–126.
- Towards measuring the representation of subjective global opinions in language models. arXiv preprint arXiv:2306.16388.
- Measuring and improving consistency in pretrained language models. Transactions of the Association for Computational Linguistics, 9:1012–1031.
- Cristina España-Bonet. 2023. Multilingual coarse political stance classification of media. the editorial line of a ChatGPT and bard newspaper. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 11757–11777, Singapore. Association for Computational Linguistics.
- From pretraining data to language models to downstream tasks: Tracking the trails of political biases leading to unfair NLP models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 11737–11762, Toronto, Canada. Association for Computational Linguistics.
- Sasuke Fujimoto and Takemoto Kazuhiro. 2023. Revisiting the political biases of chatgpt. Frontiers in Artificial Intelligence, 6.
- Ai in the gray: Exploring moderation policies in dialogic large language models vs. human answers in controversial topics. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, pages 556–565.
- The political ideology of conversational ai: Converging evidence on chatgpt’s pro-environmental, left-libertarian orientation. arXiv preprint arXiv:2301.01768.
- Aligning ai with shared human values. In International Conference on Learning Representations.
- Janus. 2022. Simulators. LessWrong online forum, 2nd September. https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/.
- Mistral 7b. arXiv preprint arXiv:2310.06825.
- The past, present and better future of feedback learning in large language models for subjective human preferences and values. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 2409–2430, Singapore. Association for Computational Linguistics.
- Who is GPT-3? an exploration of personality, values and demographics. In Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS), pages 218–227, Abu Dhabi, UAE. Association for Computational Linguistics.
- More human than human: Measuring chatgpt political bias. Public Choice, pages 1–21.
- Arvind Narayanan and Sayash Kapoor. 2023. Does chatgpt have a liberal bias? https://www.aisnakeoil.com/p/does-chatgpt-have-a-liberal-bias.
- Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems.
- The shifted and the overlooked: A task-oriented investigation of user-gpt interactions. arXiv preprint arXiv:2310.12418.
- Xstest: A test suite for identifying exaggerated safety behaviours in large language models. arXiv preprint arXiv:2308.01263.
- David Rozado. 2023a. Danger in the machine: The perils of political and demographic biases embedded in ai systems. Manhattan Institute.
- David Rozado. 2023b. The political biases of chatgpt. Social Sciences, 12(3):148.
- David Rozado. 2024. The political preferences of llms. arXiv preprint arXiv:2402.01789.
- The self-perception and political biases of chatgpt. arXiv preprint arXiv:2304.07333.
- Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548.
- Evaluating the moral beliefs encoded in llms. In Thirty-seventh Conference on Neural Information Processing Systems.
- Role play with large language models. Nature, 623(7987):493–498.
- Assessing political inclination of Bangla language models. In Proceedings of the First Workshop on Bangla Language Processing (BLP-2023), pages 62–71, Singapore. Association for Computational Linguistics.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
- Zephyr: Direct distillation of lm alignment. arXiv preprint arXiv:2310.16944.
- Merel van den Broek. 2023. Chatgpt’s left-leaning liberal bias. University of Leiden.
- Adversarial glue: A multi-task benchmark for robustness evaluation of language models. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2).
- On the robustness of chatgpt: An adversarial and out-of-distribution perspective. In ICLR 2023 Workshop on Trustworthy and Reliable Large-Scale Machine Learning Models.
- " my answer is c": First-token probabilities do not match text answers in instruction-tuned language models. arXiv preprint arXiv:2402.14499.
- Cvalues: Measuring the values of chinese large language models from safety to responsibility. arXiv preprint arXiv:2307.09705.
- (inthe)wildchat: 570k chatGPT interaction logs in the wild. In The Twelfth International Conference on Learning Representations.
- Large language models are not robust multiple choice selectors. In The Twelfth International Conference on Learning Representations.
- Lmsys-chat-1m: A large-scale real-world llm conversation dataset. arXiv preprint arXiv:2309.11998.
- Paul Röttger (37 papers)
- Valentin Hofmann (21 papers)
- Valentina Pyatkin (34 papers)
- Musashi Hinck (12 papers)
- Hannah Rose Kirk (33 papers)
- Hinrich Schütze (250 papers)
- Dirk Hovy (57 papers)