Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Measuring and Modeling "Culture" in LLMs: A Survey (2403.15412v5)

Published 5 Mar 2024 in cs.CY, cs.AI, and cs.CL

Abstract: We present a survey of more than 90 papers that aim to study cultural representation and inclusion in LLMs. We observe that none of the studies explicitly define "culture, which is a complex, multifaceted concept; instead, they probe the models on some specially designed datasets which represent certain aspects of "culture". We call these aspects the proxies of culture, and organize them across two dimensions of demographic and semantic proxies. We also categorize the probing methods employed. Our analysis indicates that only certain aspects of ``culture,'' such as values and objectives, have been studied, leaving several other interesting and important facets, especially the multitude of semantic domains (Thompson et al., 2020) and aboutness (Hershcovich et al., 2022), unexplored. Two other crucial gaps are the lack of robustness of probing techniques and situated studies on the impact of cultural mis- and under-representation in LLM-based applications.

Measuring and Modeling Culture in LLMs: A Survey Overview

The paper "Towards Measuring and Modeling 'Culture' in LLMs: A Survey" provides a comprehensive examination of the intersection between culture and LLMs, focusing on the evaluation of cultural representation, inclusion, and bias. It scrutinizes 39 papers dedicated to this purpose, highlighting the existing methodology, results, and gaps in the current body of literature. The survey underscores the complexity of defining "culture," noting that none of the reviewed papers provide a conclusive definition, instead relying on various cultural proxies within their datasets.

Cultural Proxies and Dimensions

The paper organizes the paper of culture across three main dimensions: demographic proxies, semantic proxies, and language-culture interaction axes.

  • Demographic Proxies: This dimension includes aspects such as region, language, gender, race, religion, and ethnicity. Region and language often serve as prevalent proxies for culture, but the paper notes that cultural studies involving other dimensions like gender and ethnicity are influenced significantly by Western-centric diversity narratives.
  • Semantic Proxies: While the majority of studies focus on semantic proxies like emotions and values, the survey identifies a lack of research across the full spectrum of semantic domains, such as kinship terms or physical world concepts.
  • Language-Culture Interaction: Based on Hershcovich et al. (2022) framework, this dimension categorizes interactions into aboutness, common ground, and objectives/values. The authors found many papers concentrate on objectives and values, while aboutness remains largely unexamined.

Methodologies for Probing Culture in LLMs

The survey categorizes the methodologies used to assess culture within LLMs into black-box and white-box approaches. The predominant method involves black-box probing, where LLMs are queried with culture-specific prompts and their responses analyzed. These techniques are sub-categorized into discriminative probing, where models select from given options, and generative probing, which involves free-text generation by the models. The authors critique the robustness of current probing methods, highlighting issues such as sensitivity to prompts and limited interpretability.

Identified Gaps and Recommendations

The paper identifies three critical gaps: one, limited exploration and coverage of cultural facets, mainly focusing on values and norms; two, limited robustness and reliability in probing methods; and three, absence of contextual and situated studies evaluating practical LLM applications. In addressing these gaps, the authors offer several recommendations:

  • Definitional Clarifications: Future research should clearly specify the cultural proxies and situate studies within a broader cultural context.
  • Diverse Cultural Domains: There is a need for wider exploration across various semantic domains and linguistic-cultural interactions.
  • Interdisciplinary Collaboration: Bridging with anthropology, HCI, and ICTD could offer deeper insights and understanding of cultural nuances.
  • Increased Focus on Multilingual Datasets: More culturally nuanced and non-translatable datasets should be developed to better reflect and paper cultural interactions in LLMs.

Conclusion

This survey provides a critical assessment of the current status of culture in LLMs by offering a foundational taxonomy and identifying methodological and conceptual weaknesses in existing research. The paper makes crucial strides in understanding how LLMs interact with multifaceted cultural aspects and offers a blueprint for future research endeavors aimed at achieving better cultural representation and inclusion in AI systems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (73)
  1. Peer-to-peer in the workplace: A view from the road. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, CHI ’16, page 5063–5075, New York, NY, USA. Association for Computing Machinery.
  2. SODAPOP: Open-ended discovery of social biases in social commonsense reasoning models. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 1573–1596, Dubrovnik, Croatia. Association for Computational Linguistics.
  3. Restoring and attributing ancient texts using deep neural networks. Nature, 603:280–283.
  4. Training a helpful and harmless assistant with reinforcement learning from human feedback.
  5. Constitutional ai: Harmlessness from ai feedback.
  6. Social commonsense for explanation and cultural bias discovery. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 3745–3760, Dubrovnik, Croatia. Association for Computational Linguistics.
  7. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, page 610–623, New York, NY, USA. Association for Computing Machinery.
  8. Janet Blake. 2000. On defining the cultural heritage. The International and Comparative Law Quarterly, 49(1):61–85.
  9. Language (technology) is power: A critical survey of “bias” in NLP. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5454–5476, Online. Association for Computational Linguistics.
  10. Workflow from within and without: Technology and cooperative work on the print industry shopfloor. In European Conference on Computer Supported Cooperative Work.
  11. Cultural Adaptation of Recipes. Transactions of the Association for Computational Linguistics, 12:80–99.
  12. Assessing cross-cultural alignment between ChatGPT and human societies: An empirical study. In Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP), pages 53–67, Dubrovnik, Croatia. Association for Computational Linguistics.
  13. Michael Castelle. 2022. Sapir’s thought-grooves and whorf’s tensors: Reconciling transformer architectures with cultural anthropology. In Cultures in AI/AI in Culture, A NeurIPS 2022 Workshop. University of Warwick, Centre for Interdisciplinary Methodologies.
  14. Sociocultural norm similarities and differences via situational alignment and explainable textual entailment. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 3548–3564, Singapore. Association for Computational Linguistics.
  15. Jan Cieciuch and Shalom Schwartz. 2012. The number of distinct basic values and their structure assessed by pvq–40. Journal of Personality Assessment, 94:321–8.
  16. Toward cultural bias evaluation datasets: The case of Bengali gender, religious, and national identity. In Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP), pages 68–83, Dubrovnik, Croatia. Association for Computational Linguistics.
  17. Building socio-culturally inclusive stereotype resources with community engagement.
  18. Towards measuring the representation of subjective global opinions in language models.
  19. EtiCor: Corpus for analyzing LLMs for etiquettes. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 6921–6931, Singapore. Association for Computational Linguistics.
  20. Lance Eliot. 2022. Ai ethics and the future of where large language models are heading. Forbes.
  21. EVS/WVS. 2022. Joint evs/wvs 2017-2022 dataset (joint evs/wvs). GESIS, Cologne. ZA7505 Data file Version 4.0.0, https://doi.org/10.4232/1.14023.
  22. From pretraining data to language models to downstream tasks: Tracking the trails of political biases leading to unfair NLP models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 11737–11762, Toronto, Canada. Association for Computational Linguistics.
  23. NORMSAGE: Multi-lingual multi-cultural norm discovery from conversations on-the-fly. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 15217–15230, Singapore. Association for Computational Linguistics.
  24. Improving alignment of dialogue agents via targeted human judgements.
  25. Greg Gondwe. 2023. Chatgpt and the global south: how are journalists in sub-saharan africa engaging with generative ai? Online Media and Global Communication, 2.
  26. Self-assessment tests are unreliable measures of llm personality.
  27. Challenges and strategies in cross-cultural NLP. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6997–7013, Dublin, Ireland. Association for Computational Linguistics.
  28. G. Hofstede. 1984. Culture’s Consequences: International Differences in Work-Related Values. Cross Cultural Research and Methodology. SAGE Publications.
  29. Jing Huang and Diyi Yang. 2023a. Culturally aware natural language inference. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 7591–7609, Singapore. Association for Computational Linguistics.
  30. Jing Huang and Diyi Yang. 2023b. Culturally aware natural language inference. In The 2023 Conference on Empirical Methods in Natural Language Processing.
  31. SeeGULL: A stereotype benchmark with broad geo-cultural coverage leveraging generative models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 9851–9870, Toronto, Canada. Association for Computational Linguistics.
  32. Can machines learn morality? the delphi experiment.
  33. The ghost in the machine has an american accent: value conflict in gpt-3.
  34. Multi-lingual and multi-cultural figurative language understanding. In Findings of the Association for Computational Linguistics: ACL 2023, pages 8269–8284, Toronto, Canada. Association for Computational Linguistics.
  35. Making chat at home in the hospital: Exploring chat use by nurses. Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems.
  36. Evaluating the diversity, equity, and inclusion of NLP technology: A case study for Indian languages. In Findings of the Association for Computational Linguistics: EACL 2023, pages 1763–1777, Dubrovnik, Croatia. Association for Computational Linguistics.
  37. Large language models only pass primary school exams in Indonesia: A comprehensive test on IndoMMLU. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 12359–12374, Singapore. Association for Computational Linguistics.
  38. Large language models as superpositions of cultural perspectives.
  39. A Cultural Approach to Interpersonal Communication: Essential Readings. Wiley.
  40. Cristina Mora. 2013. Cultures and organizations: Software of the mind intercultural cooperation and its importance for survival. Journal of Media Research, 6(1):65.
  41. Global Voices, local biases: Socio-cultural prejudices across languages. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 15828–15845, Singapore. Association for Computational Linguistics.
  42. Theory of Culture. New directions in cultural analysis. University of California Press.
  43. StereoSet: Measuring stereotypical bias in pretrained language models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 5356–5371, Online. Association for Computational Linguistics.
  44. CrowS-pairs: A challenge dataset for measuring social biases in masked language models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1953–1967, Online. Association for Computational Linguistics.
  45. Having beer after prayer? measuring cultural bias in large language models.
  46. Extracting cultural commonsense knowledge at scale. In Proceedings of the ACM Web Conference 2023, WWW ’23. ACM.
  47. At home with the technology: an ethnographic study of a set-top-box trial. ACM Trans. Comput. Hum. Interact., 6(3):282–308.
  48. Shramay Palta and Rachel Rudinger. 2023. FORK: A bite-sized test set for probing culinary cultural biases in commonsense reasoning models. In Findings of the Association for Computational Linguistics: ACL 2023, pages 9952–9962, Toronto, Canada. Association for Computational Linguistics.
  49. Talcott Parsons. 1972. Culture and social system revisited. Social Science Quarterly, pages 253–266.
  50. RiSAWOZ: A large-scale multi-domain Wizard-of-Oz dataset with rich semantic annotations for task-oriented dialogue modeling. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 930–940, Online. Association for Computational Linguistics.
  51. Aida Ramezani and Yang Xu. 2023. Knowledge of cultural moral norms in large language models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 428–446, Toronto, Canada. Association for Computational Linguistics.
  52. Ethical reasoning over moral alignment: A case and framework for in-context ethical policies in LLMs. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 13370–13388, Singapore. Association for Computational Linguistics.
  53. Development in Judging Moral Issues. University of Minnesota Press.
  54. Re-imagining algorithmic fairness in india and beyond. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, page 315–328, New York, NY, USA. Association for Computing Machinery.
  55. NLPositionality: Characterizing design biases of datasets and models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 9080–9102, Toronto, Canada. Association for Computational Linguistics.
  56. Large pre-trained language models contain human-like biases of what is right and wrong to do. Nature Machine Intelligence, 4:258 – 268.
  57. Quantifying language models’ sensitivity to spurious features in prompt design or: How i learned to start worrying about prompt formatting.
  58. Modeling cross-cultural pragmatic inference with codenames duet. In Findings of the Association for Computational Linguistics: ACL 2023, pages 6550–6569, Toronto, Canada. Association for Computational Linguistics.
  59. The Cultural Psychology of Development: One Mind, Many Mentalities, volume 1.
  60. Everything you need to know about multilingual LLMs: Towards fair, performant and reliable models for languages of the world. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 6: Tutorial Abstracts), pages 21–26, Toronto, Canada. Association for Computational Linguistics.
  61. Value kaleidoscope: Engaging ai with pluralistic human values, rights, and duties.
  62. A word on machine ethics: A response to jiang et al. (2021).
  63. Probing the moral development of large language models through defining issues test.
  64. Cultural influences on word meanings revealed through large-scale semantic alignment. Nature Human Behaviour, 4(10):1029–1038.
  65. Silvia Vaccino-Salvadore. 2023. Exploring the ethical dimensions of using chatgpt in language learning and beyond. Languages, 8(3).
  66. Are personalized stochastic parrots more dangerous? evaluating persona biases in dialogue systems. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 9677–9705, Singapore. Association for Computational Linguistics.
  67. Seaeval for multilingual foundation models: From cross-lingual alignment to cultural reasoning.
  68. Copal-id: Indonesian language reasoning with local culture and nuances.
  69. Gradient-based language model red teaming.
  70. Cross-cultural analysis of human values, morals, and biases in folk tales. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 5113–5125, Singapore. Association for Computational Linguistics.
  71. From instructions to intrinsic human values – a survey of alignment goals for big models.
  72. The skipped beat: A study of sociopragmatic understanding in LLMs for 64 languages. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 2630–2662, Singapore. Association for Computational Linguistics.
  73. Cultural compass: Predicting transfer learning success in offensive language detection with cultural features. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 12684–12702, Singapore. Association for Computational Linguistics.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Muhammad Farid Adilazuarda (14 papers)
  2. Sagnik Mukherjee (13 papers)
  3. Pradhyumna Lavania (2 papers)
  4. Siddhant Singh (7 papers)
  5. Alham Fikri Aji (94 papers)
  6. Jacki O'Neill (4 papers)
  7. Ashutosh Modi (60 papers)
  8. Monojit Choudhury (66 papers)
Citations (21)