Towards Measuring the Representation of Subjective Global Opinions in Language Models (2306.16388v2)
Abstract: LLMs may not equitably represent diverse global perspectives on societal issues. In this paper, we develop a quantitative framework to evaluate whose opinions model-generated responses are more similar to. We first build a dataset, GlobalOpinionQA, comprised of questions and answers from cross-national surveys designed to capture diverse opinions on global issues across different countries. Next, we define a metric that quantifies the similarity between LLM-generated survey responses and human responses, conditioned on country. With our framework, we run three experiments on an LLM trained to be helpful, honest, and harmless with Constitutional AI. By default, LLM responses tend to be more similar to the opinions of certain populations, such as those from the USA, and some European and South American countries, highlighting the potential for biases. When we prompt the model to consider a particular country's perspective, responses shift to be more similar to the opinions of the prompted populations, but can reflect harmful cultural stereotypes. When we translate GlobalOpinionQA questions to a target language, the model's responses do not necessarily become the most similar to the opinions of speakers of those languages. We release our dataset for others to use and build on. Our data is at https://huggingface.co/datasets/Anthropic/LLM_global_opinions. We also provide an interactive visualization at https://LLMglobalvalues.anthropic.com.
- Persistent anti-muslim bias in large language models. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, AIES ’21, page 298–306, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450384735. doi: 10.1145/3461702.3462624. URL https://doi.org/10.1145/3461702.3462624.
- Cecilia Ovesdotter Alm. Subjective natural language problems: Motivations, applications, characterizations, and implications. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers - Volume 2, HLT ’11, page 107–112, USA, 2011. Association for Computational Linguistics. ISBN 9781932432886.
- Probing pre-trained language models for cross-cultural differences in values. In Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP), pages 114–130, Dubrovnik, Croatia, May 2023. Association for Computational Linguistics. URL https://aclanthology.org/2023.c3nlp-1.12.
- A general language assistant as a laboratory for alignment, 2021.
- Training a helpful and harmless assistant with reinforcement learning from human feedback. ArXiv, abs/2204.05862, 2022a.
- Constitutional ai: Harmlessness from ai feedback, 2022b.
- Big data’s disparate impact. California Law Review, 104:671, 2016.
- On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, page 610–623, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450383097. doi: 10.1145/3442188.3445922. URL https://doi.org/10.1145/3442188.3445922.
- Adam J. Berinsky. Measuring public opinion with surveys. Annual Review of Political Science, 20(1):309–329, 2017. doi: 10.1146/annurev-polisci-101513-113724. URL https://doi.org/10.1146/annurev-polisci-101513-113724.
- Language (technology) is power: A critical survey of “bias” in NLP. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5454–5476, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.485. URL https://aclanthology.org/2020.acl-main.485.
- On the opportunities and risks of foundation models. ArXiv, 2021. URL https://crfm.stanford.edu/assets/report.pdf.
- Picking on the same person: Does algorithmic monoculture lead to outcome homogenization? In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 3663–3678. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/17a234c91f746d9625a75cf8a8731ee2-Paper-Conference.pdf.
- Identifying and reducing gender bias in word-level language models. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop, pages 7–15, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-3002. URL https://aclanthology.org/N19-3002.
- Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.
- Identity and interaction: a sociocultural linguistic approach. Discourse Studies, 7(4-5):585–614, 2005. doi: 10.1177/1461445605054407. URL https://doi.org/10.1177/1461445605054407.
- The whiteness of ai. Philosophy & Technology, 33:1–19, 12 2020. doi: 10.1007/s13347-020-00415-6.
- Marked personas: Using natural language prompts to measure stereotypes in language models, 2023.
- Deep reinforcement learning from human preferences. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper_files/paper/2017/file/d5e2c0adad503c91f91df240d0cd4e49-Paper.pdf.
- No language left behind: Scaling human-centered machine translation. arXiv preprint arXiv:2207.04672, 2022.
- Dealing with disagreements: Looking beyond the majority vote in subjective annotations. Transactions of the Association for Computational Linguistics, 10:92–110, 2022. doi: 10.1162/tacl_a_00449. URL https://aclanthology.org/2022.tacl-1.6.
- Fad-Admin. Western civilization, our tradition, Nov 2020. URL https://isi.org/intercollegiate-review/western-civilization-our-tradition/.
- Measuring diversity of artificial intelligence conferences. In Deepti Lamba and William H. Hsu, editors, Proceedings of 2nd Workshop on Diversity in Artificial Intelligence (AIDBEI), volume 142 of Proceedings of Machine Learning Research, pages 39–50. PMLR, 09 Feb 2021. URL https://proceedings.mlr.press/v142/freire21a.html.
- Iason Gabriel. Artificial intelligence, values, and alignment. Minds and Machines, 30(3):411–437, sep 2020. doi: 10.1007/s11023-020-09539-2. URL https://doi.org/10.1007%2Fs11023-020-09539-2.
- The challenge of value alignment: from fairer algorithms to AI safety. CoRR, abs/2101.06060, 2021. URL https://arxiv.org/abs/2101.06060.
- Predictability and surprise in large generative models. In 2022 ACM Conference on Fairness, Accountability, and Transparency. ACM, jun 2022a. doi: 10.1145/3531146.3533229. URL https://doi.org/10.1145%2F3531146.3533229.
- Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned. arXiv preprint arXiv:2209.07858, 2022b.
- The capacity for moral self-correction in large language models, 2023.
- RealToxicityPrompts: Evaluating neural toxic degeneration in language models. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3356–3369, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.findings-emnlp.301. URL https://aclanthology.org/2020.findings-emnlp.301.
- Improving alignment of dialogue agents via targeted human judgements, 2022.
- The disagreement deconvolution: Bringing machine learning performance metrics in line with reality. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, CHI ’21, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450380966. doi: 10.1145/3411764.3445423. URL https://doi.org/10.1145/3411764.3445423.
- Jury learning: Integrating dissenting voices into machine learning models. In CHI Conference on Human Factors in Computing Systems. ACM, apr 2022. doi: 10.1145/3491102.3502004. URL https://doi.org/10.1145%2F3491102.3502004.
- Is your toxicity my toxicity? exploring the impact of rater identity on toxicity annotation. Proceedings of the ACM on Human-Computer Interaction, 6:1–28, 2022.
- World values survey: Round seven – country-pooled datafile version 5.0.0, 2022.
- The political ideology of conversational ai: Converging evidence on chatgpt’s pro-environmental, left-libertarian orientation, 2023.
- ToxiGen: A large-scale machine-generated dataset for adversarial and implicit hate speech detection. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3309–3326, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.acl-long.234. URL https://aclanthology.org/2022.acl-long.234.
- Aligning {ai} with shared human values. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=dNy_RKzJacY.
- The weirdest people in the world? Behavioral and Brain Sciences, 33(2-3):61–83, June 2010. ISSN 1469-1825. URL http://journals.cambridge.org/abstract_S0140525X0999152X.
- The importance of modeling social factors of language: Theory and practice. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 588–602, Online, June 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.naacl-main.49. URL https://aclanthology.org/2021.naacl-main.49.
- Social biases in NLP models as barriers for persons with disabilities. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5491–5501, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.487. URL https://aclanthology.org/2020.acl-main.487.
- Co-writing with opinionated language models affects users’ views. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, CHI ’23, New York, NY, USA, 2023. Association for Computing Machinery. ISBN 9781450394215. doi: 10.1145/3544548.3581196. URL https://doi.org/10.1145/3544548.3581196.
- CommunityLM: Probing partisan worldviews from language models. In Proceedings of the 29th International Conference on Computational Linguistics, pages 6818–6826, Gyeongju, Republic of Korea, October 2022. International Committee on Computational Linguistics. URL https://aclanthology.org/2022.coling-1.593.
- The ghost in the machine has an american accent: value conflict in gpt-3, 2022.
- The state and fate of linguistic diversity and inclusion in the NLP world. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6282–6293, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.560. URL https://aclanthology.org/2020.acl-main.560.
- Language models (mostly) know what they know, 2022.
- Pratyusha Kalluri. Don’t ask if artificial intelligence is good or fair, ask how it shifts power. Nature, 583:169, 2020.
- Estimating the Personality of White-Box Language Models. arXiv e-prints, art. arXiv:2204.12000, April 2022. doi: 10.48550/arXiv.2204.12000.
- In conversation with artificial intelligence: aligning language models with human values. Philosophy & Technology, 36(2):1–24, 2023.
- When do pre-training biases propagate to downstream tasks? a case study in text summarization. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 3206–3219, Dubrovnik, Croatia, May 2023. Association for Computational Linguistics. URL https://aclanthology.org/2023.eacl-main.234.
- Alon Lavie. Evaluating the output of machine translation systems. In Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Tutorials, 2010.
- Towards understanding and mitigating social biases in language models. In International Conference on Machine Learning, pages 6565–6576. PMLR, 2021.
- Holistic evaluation of language models, 2022.
- Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8086–8098, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.acl-long.556. URL https://aclanthology.org/2022.acl-long.556.
- Li Lucy and David Bamman. Gender and representation bias in GPT-3 generated stories. In Proceedings of the Third Workshop on Narrative Understanding, pages 48–55, Virtual, June 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.nuse-1.5. URL https://aclanthology.org/2021.nuse-1.5.
- S. McConnell-Ginet. Words Matter: Meaning and Power. Cambridge University Press, 2020. ISBN 9781108427210. URL https://books.google.com/books?id=gKVTzQEACAAJ.
- StereoSet: Measuring stereotypical bias in pretrained language models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 5356–5371, Online, August 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.acl-long.416. URL https://aclanthology.org/2021.acl-long.416.
- Nationality bias in text generation. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 116–122, Dubrovnik, Croatia, May 2023. Association for Computational Linguistics. URL https://aclanthology.org/2023.eacl-main.9.
- Social simulacra: Creating populated prototypes for social computing systems. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology, UIST ’22, New York, NY, USA, 2022. Association for Computing Machinery. ISBN 9781450393201. doi: 10.1145/3526113.3545616. URL https://doi.org/10.1145/3526113.3545616.
- Data and its (dis)contents: A survey of dataset development and use in machine learning research. Patterns, 2(11):100336, 2021. ISSN 2666-3899. doi: https://doi.org/10.1016/j.patter.2021.100336. URL https://www.sciencedirect.com/science/article/pii/S2666389921001847.
- Red teaming language models with language models. CoRR, abs/2202.03286, 2022a. URL https://arxiv.org/abs/2202.03286.
- Discovering language model behaviors with model-written evaluations, 2022b.
- A human rights-based approach to responsible ai, 2022.
- Scaling language models: Methods, analysis & insights from training gopher. ArXiv, abs/2112.11446, 2021.
- Ai and the everything in the whole wide world benchmark. In J. Vanschoren and S. Yeung, editors, Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, volume 1. Curran, 2021. URL https://datasets-benchmarks-proceedings.neurips.cc/paper_files/paper/2021/file/084b6fbb10729ed4da8c3d3f5a3ae7c9-Paper-round2.pdf.
- Characteristics of harmful text: Towards rigorous benchmarking of language models, 2022.
- Sebastian Ruder. Why You Should Do NLP Beyond English. http://ruder.io/nlp-beyond-english, 2020.
- Re-imagining algorithmic fairness in india and beyond. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, page 315–328, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450383097. doi: 10.1145/3442188.3445896. URL https://doi.org/10.1145/3442188.3445896.
- Whose opinions do language models reflect?, 2023.
- Social bias frames: Reasoning about social and power implications of language. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5477–5490, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.486. URL https://aclanthology.org/2020.acl-main.486.
- Annotators with attitudes: How annotator beliefs and identities bias toxic language detection. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5884–5906, Seattle, United States, July 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.naacl-main.431. URL https://aclanthology.org/2022.naacl-main.431.
- Fairness and abstraction in sociotechnical systems. In Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT*’19, page 59–68, New York, NY, USA, 2019. Association for Computing Machinery. ISBN 9781450361255. doi: 10.1145/3287560.3287598. URL https://doi.org/10.1145/3287560.3287598.
- The woman worked as a babysitter: On biases in language generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3407–3412, Hong Kong, China, November 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1339. URL https://aclanthology.org/D19-1339.
- Societal biases in language generation: Progress and challenges. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4275–4293, Online, August 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.acl-long.330. URL https://aclanthology.org/2021.acl-long.330.
- Gabriel Simmons. Moral mimicry: Large language models produce moral rationalizations tailored to political identity, 2022.
- Process for adapting language models to society (PALMS) with values-targeted datasets. CoRR, abs/2106.10328, 2021. URL https://arxiv.org/abs/2106.10328.
- Beyond the imitation game: Quantifying and extrapolating the capabilities of language models, 2022.
- Learning to summarize with human feedback. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 3008–3021. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/1f89885d556929e98d3ef9b86448f951-Paper.pdf.
- Understanding the capabilities, limitations, and societal impact of large language models, 2021.
- Manifestations of xenophobia in ai systems, 2022.
- Ethical and social risks of harm from language models, 2021.
- Taxonomy of risks posed by language models. In 2022 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’22, page 214–229, New York, NY, USA, 2022. Association for Computing Machinery. ISBN 9781450393522. doi: 10.1145/3531146.3533088. URL https://doi.org/10.1145/3531146.3533088.
- Paul Whiteley. Studies in public opinion: Attitudes, nonattitudes, measurement error, and change. Perspectives on Politics, 3:680–681, 09 2005. doi: 10.1017/S1537592705810254.
- Benchmarking large language models for news summarization, 2023.
- Fine-tuning language models from human preferences. CoRR, abs/1909.08593, 2019. URL http://arxiv.org/abs/1909.08593.
- Esin Durmus (38 papers)
- Thomas I. Liao (5 papers)
- Nicholas Schiefer (18 papers)
- Amanda Askell (23 papers)
- Anton Bakhtin (16 papers)
- Carol Chen (7 papers)
- Zac Hatfield-Dodds (19 papers)
- Danny Hernandez (16 papers)
- Nicholas Joseph (18 papers)
- Liane Lovitt (13 papers)
- Sam McCandlish (24 papers)
- Orowa Sikder (3 papers)
- Alex Tamkin (29 papers)
- Janel Thamkul (1 paper)
- Jared Kaplan (79 papers)
- Jack Clark (28 papers)
- Deep Ganguli (26 papers)
- Karina Nguyen (11 papers)