Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 81 tok/s
Gemini 2.5 Pro 57 tok/s Pro
GPT-5 Medium 31 tok/s Pro
GPT-5 High 23 tok/s Pro
GPT-4o 104 tok/s Pro
GPT OSS 120B 460 tok/s Pro
Kimi K2 216 tok/s Pro
2000 character limit reached

PRISM: A Methodology for Auditing Biases in Large Language Models (2410.18906v2)

Published 24 Oct 2024 in cs.CL, cs.AI, and cs.CY

Abstract: Auditing LLMs to discover their biases and preferences is an emerging challenge in creating Responsible AI. While various methods have been proposed to elicit the preferences of such models, countermeasures have been taken by LLM trainers, such that LLMs hide, obfuscate or point blank refuse to disclosure their positions on certain subjects. This paper presents PRISM, a flexible, inquiry-based methodology for auditing LLMs - that seeks to illicit such positions indirectly through task-based inquiry prompting rather than direct inquiry of said preferences. To demonstrate the utility of the methodology, we applied PRISM on the Political Compass Test, where we assessed the political leanings of twenty-one LLMs from seven providers. We show LLMs, by default, espouse positions that are economically left and socially liberal (consistent with prior work). We also show the space of positions that these models are willing to espouse - where some models are more constrained and less compliant than others - while others are more neutral and objective. In sum, PRISM can more reliably probe and audit LLMs to understand their preferences, biases and constraints.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (16)
  1. Measuring political bias in large language models: What is said and how it is said. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 11142–11159, Bangkok, Thailand. Association for Computational Linguistics.
  2. Building guardrails for large language models.
  3. European Parliament. 2023. Eu ai act: First regulation on artificial intelligence. Accessed: 2024-09-22.
  4. From pretraining data to language models to downstream tasks: Tracking the trails of political biases leading to unfair NLP models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 11737–11762, Toronto, Canada. Association for Computational Linguistics.
  5. Chatgpt outperforms crowd workers for text-annotation tasks. Proceedings of the National Academy of Sciences, 120(30):e2305016120.
  6. Can large language models perform relation-based argument mining?
  7. The political ideology of conversational ai: Converging evidence on chatgpt’s pro-environmental, left-libertarian orientation. arXiv preprint arXiv:2301.01768.
  8. Tonja Jacobi and Matthew Sag. 2024. We are the ai problem. Emory Law Journal Online. Available at SSRN: https://ssrn.com/abstract=4820165.
  9. Q. Vera Liao and Jennifer Wortman Vaughan. 2023. Ai transparency in the age of llms: A human-centered research roadmap.
  10. Auditing large language models: a three-layered approach. AI and ethics, abs/2302.08500.
  11. Political compass or spinning arrow? towards more meaningful evaluations for values and opinions in large language models. arXiv preprint arXiv:2402.16786.
  12. David Rozado. 2023. The political biases of chatgpt. Social Sciences, 12(3).
  13. David Rozado. 2024. The political preferences of llms. PLoS ONE, 19(7):e0306621.
  14. The self-perception and political biases of chatgpt.
  15. The prompt report: A systematic survey of prompting techniques.
  16. Zachary Zuck. 2024. Ai and democracy: A human problem. In ITNG 2024: 21st International Conference on Information Technology-New Generations, pages 133–142, Cham. Springer Nature Switzerland.
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper introduces PRISM, a novel indirect inquiry method that effectively audits hidden biases in large language models.
  • It applies the Political Compass Test across 21 models, mapping diverse economic and social leanings with lower refusal rates.
  • The study demonstrates PRISM's robustness over direct methods, offering actionable insights for advancing responsible AI.

Auditing Biases in LLMs Using PRISM

The paper "PRISM: A Methodology for Auditing Biases in LLMs" introduces a novel approach to auditing LLMs for biases and preferences. This work addresses a crucial need in developing Responsible AI, focusing on the constraints and biases that may arise through the introduction of guardrails and training aimed at reducing harmful outputs.

Methodology Overview

PRISM stands for Preference Revelation through Indirect Stimulus Methodology. It bypasses direct inquiries, which LLMs often sidestep due to training, by engaging models in task-based inquiries. The methodology involves asking LLMs to construct essays on specific statements, bypassing their potential reticence in revealing biases directly.

The paper elucidates the application of PRISM using the Political Compass Test (PCT) to map political leanings across various LLMs. By analyzing the essays generated from prompts designed with and without roles, this approach provides a window into the biases and inclinations of these models. Unlike direct methods, which prompt explicit bias declarations and face high non-compliance, PRISM's indirect inquiry method shows lower refusal rates, thereby providing a more complete understanding of model biases.

Implications and Findings

The application of PRISM across twenty-one models from seven providers revealed a predominantly economically left and socially liberal position among default LLM settings. Notably, the range and willingness of LLMs to express political positions varied significantly across models.

By mapping the windows of political expressions the models are willing to espouse, the paper highlights the variability in bias manifestation among LLMs. For instance, while some models like GPT-4 showcased a broad willingness to express diverse political views, others like LLama2 were more restrictive. Interestingly, all models showed an inclination to avoid expressing views consistent with strongly left-authoritarian or right-liberal positions.

Comparative Evaluation

The paper's comparison of PRISM with direct methods demonstrates its superior efficacy and reliability. PRISM reduced the rate of refusal and neutrality, underscoring its robustness in revealing both overt and covert biases. This is a critical contribution as LLMs increasingly become part of decision-critical applications.

Future Directions

PRISM opens several investigative avenues:

  • Role-Based Audits: Extending role-based experiments could provide insights into stereotype conceptualization.
  • Bias Dimensions Exploration: Applying PRISM to explore other bias dimensions—gender, religion—could enrich understanding.
  • Longitudinal Studies: Monitoring bias evolution over time and across model iterations could reveal training impact patterns.
  • Prompt Strategy Refinement: Fine-tuning prompts for nuanced bias detection remains a development priority.

Limitations

Acknowledged limitations include potential AI assessor biases and the need for more comprehensive sampling and prompt strategies. The paper also notes the potential for obfuscation in LLM responses even when using indirect methods, which necessitates continuous improvement in PRISM's application.

In summary, PRISM offers a sophisticated, adaptive methodology for probing LLM biases, ensuring they align with societal and organizational values. This approach sets a strong foundation for ongoing research and development in LLM auditing, fostering a more transparent and accountable AI landscape.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube