PRISM: A Methodology for Auditing Biases in Large Language Models (2410.18906v2)
Abstract: Auditing LLMs to discover their biases and preferences is an emerging challenge in creating Responsible AI. While various methods have been proposed to elicit the preferences of such models, countermeasures have been taken by LLM trainers, such that LLMs hide, obfuscate or point blank refuse to disclosure their positions on certain subjects. This paper presents PRISM, a flexible, inquiry-based methodology for auditing LLMs - that seeks to illicit such positions indirectly through task-based inquiry prompting rather than direct inquiry of said preferences. To demonstrate the utility of the methodology, we applied PRISM on the Political Compass Test, where we assessed the political leanings of twenty-one LLMs from seven providers. We show LLMs, by default, espouse positions that are economically left and socially liberal (consistent with prior work). We also show the space of positions that these models are willing to espouse - where some models are more constrained and less compliant than others - while others are more neutral and objective. In sum, PRISM can more reliably probe and audit LLMs to understand their preferences, biases and constraints.
- Measuring political bias in large language models: What is said and how it is said. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 11142–11159, Bangkok, Thailand. Association for Computational Linguistics.
- Building guardrails for large language models.
- European Parliament. 2023. Eu ai act: First regulation on artificial intelligence. Accessed: 2024-09-22.
- From pretraining data to language models to downstream tasks: Tracking the trails of political biases leading to unfair NLP models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 11737–11762, Toronto, Canada. Association for Computational Linguistics.
- Chatgpt outperforms crowd workers for text-annotation tasks. Proceedings of the National Academy of Sciences, 120(30):e2305016120.
- Can large language models perform relation-based argument mining?
- The political ideology of conversational ai: Converging evidence on chatgpt’s pro-environmental, left-libertarian orientation. arXiv preprint arXiv:2301.01768.
- Tonja Jacobi and Matthew Sag. 2024. We are the ai problem. Emory Law Journal Online. Available at SSRN: https://ssrn.com/abstract=4820165.
- Q. Vera Liao and Jennifer Wortman Vaughan. 2023. Ai transparency in the age of llms: A human-centered research roadmap.
- Auditing large language models: a three-layered approach. AI and ethics, abs/2302.08500.
- Political compass or spinning arrow? towards more meaningful evaluations for values and opinions in large language models. arXiv preprint arXiv:2402.16786.
- David Rozado. 2023. The political biases of chatgpt. Social Sciences, 12(3).
- David Rozado. 2024. The political preferences of llms. PLoS ONE, 19(7):e0306621.
- The self-perception and political biases of chatgpt.
- The prompt report: A systematic survey of prompting techniques.
- Zachary Zuck. 2024. Ai and democracy: A human problem. In ITNG 2024: 21st International Conference on Information Technology-New Generations, pages 133–142, Cham. Springer Nature Switzerland.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.