Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Survey of Cultural Awareness in Language Models: Text and Beyond (2411.00860v1)

Published 30 Oct 2024 in cs.CL and cs.CV

Abstract: Large-scale deployment of LLMs in various applications, such as chatbots and virtual assistants, requires LLMs to be culturally sensitive to the user to ensure inclusivity. Culture has been widely studied in psychology and anthropology, and there has been a recent surge in research on making LLMs more culturally inclusive in LLMs that goes beyond multilinguality and builds on findings from psychology and anthropology. In this paper, we survey efforts towards incorporating cultural awareness into text-based and multimodal LLMs. We start by defining cultural awareness in LLMs, taking the definitions of culture from anthropology and psychology as a point of departure. We then examine methodologies adopted for creating cross-cultural datasets, strategies for cultural inclusion in downstream tasks, and methodologies that have been used for benchmarking cultural awareness in LLMs. Further, we discuss the ethical implications of cultural alignment, the role of Human-Computer Interaction in driving cultural inclusion in LLMs, and the role of cultural alignment in driving social science research. We finally provide pointers to future research based on our findings about gaps in the literature.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (473)
  1. PersianLLaMA: Towards Building First Persian Large Language Model. ArXiv preprint, abs/2312.15713.
  2. ArtEmis: Affective Language for Visual Art. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021, pages 11569–11579, Computer Vision Foundation / IEEE.
  3. Towards Measuring and Modeling "Culture" in LLMs: A Survey. ArXiv preprint, abs/2403.15412.
  4. Ethical Reasoning and Moral Value Alignment of LLMs Depend on the Language We Prompt Them in. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 6330–6340, ELRA and ICCL, Torino, Italia.
  5. The Illusion of Artificial Inclusion. In Proceedings of the CHI Conference on Human Factors in Computing Systems, CHI 2024, Honolulu, HI, USA, May 11-16, 2024, pages 286:1–286:12, ACM.
  6. Are Generative Language Models Multicultural? A Study on Hausa Culture and Emotions using ChatGPT. In Proceedings of the 2nd Workshop on Cross-Cultural Considerations in NLP, pages 98–106, Association for Computational Linguistics, Bangkok, Thailand.
  7. Varepsilon kú mask: Integrating Yorùbá cultural greetings into machine translation. In Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP), pages 1–7, Association for Computational Linguistics, Dubrovnik, Croatia.
  8. AraTrust: An Evaluation of Trustworthiness for LLMs in Arabic. ArXiv preprint, abs/2403.09017.
  9. Investigating Cultural Alignment of Large Language Models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 12404–12422, Association for Computational Linguistics, Bangkok, Thailand.
  10. Sabia-2: A New Generation of Portuguese Large Language Models. ArXiv preprint, abs/2403.09887.
  11. 101 Billion Arabic Words Dataset. ArXiv preprint, abs/2405.01590.
  12. CIDAR: Culturally Relevant Instruction Dataset For Arabic. In Findings of the Association for Computational Linguistics ACL 2024, pages 12878–12901, Association for Computational Linguistics, Bangkok, Thailand and virtual meeting.
  13. From Pampas to Pixels: Fine-Tuning Diffusion Models for Gaúcho Heritage. ArXiv preprint, abs/2401.05520.
  14. The MuSe 2024 Multimodal Sentiment Analysis Challenge: Social Perception and Humor Recognition. ArXiv preprint, abs/2406.07753.
  15. ExHuBERT: Enhancing HuBERT Through Block Extension and Fine-Tuning on 37 Emotion Datasets. ArXiv preprint, abs/2406.10275.
  16. VKIE: The Application of Key Information Extraction on Video Text. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 532–540, Association for Computational Linguistics, Singapore.
  17. See It from My Perspective: Diagnosing the Western Cultural Bias of Large Vision-Language Models in Image Understanding. ArXiv preprint, abs/2406.11665.
  18. VQA: Visual Question Answering. In 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, December 7-13, 2015, pages 2425–2433, IEEE Computer Society.
  19. Resources for Multilingual Hate Speech Detection. In Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH), pages 122–130, Association for Computational Linguistics, Seattle, Washington (Hybrid).
  20. Leveraging AI for Democratic Discourse: Chat Interventions Can Improve Online Political Conversations at Scale. Proceedings of the National Academy of Sciences, 120(41):e2311627120.
  21. Out of One, Many: Using Language Models to Simulate Human Samples. Political Analysis, 31(3):337–351.
  22. Probing Pre-Trained Language Models for Cross-Cultural Differences in Values. In Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP), pages 114–130, Association for Computational Linguistics, Dubrovnik, Croatia.
  23. CaLMQA: Exploring culturally specific long-form question answering across 23 languages. ArXiv preprint, abs/2406.17761.
  24. Arvai, Joseph. 2013. Thinking, fast and slow, Daniel Kahneman, Farrar, Straus & Giroux.
  25. Evaluating Visual and Cultural Interpretation: The K-Viscuit Benchmark with Human-VLM Collaboration. ArXiv preprint, abs/2406.16469.
  26. Coig-cqia: Quality is all you need for chinese instruction fine-tuning. ArXiv preprint, abs/2403.18058.
  27. Bail, Christopher A. 2024. Can Generative AI Improve Social Science? Proceedings of the National Academy of Sciences, 121(21):e2314021121.
  28. How well can text-to-image generative models understand ethical natural language interventions? In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 1358–1370, Association for Computational Linguistics, Abu Dhabi, United Arab Emirates.
  29. A bi-step grounding paradigm for large language models in recommendation systems. ArXiv preprint, abs/2308.08434.
  30. Bashkow, Ira. 2004. A neo-boasian conception of cultural boundaries. American Anthropologist, 106(3):443–458.
  31. Inspecting the Geographical Representativeness of Images from Text-to-Image Models. In IEEE/CVF International Conference on Computer Vision, ICCV 2023, Paris, France, October 1-6, 2023, pages 5113–5124, IEEE.
  32. Sensitivity, performance, robustness: Deconstructing the effect of sociodemographic prompting. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2589–2615, Association for Computational Linguistics, St. Julian’s, Malta.
  33. Cosmocult card game: A methodological tool to understand the hybrid and peripheral cultural consumption of young people. Open Library of Humanities, 4(1).
  34. Belani, Ritu and Jeffrey Flanigan. 2022. Automatic identification of motivation for code-switching in speech transcripts. ArXiv preprint, abs/2212.08565.
  35. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, page 610–623, Association for Computing Machinery, New York, NY, USA.
  36. Cultural Value Resonance in Folktales: A Transformer-Based Analysis with the World Value Corpus. In International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction and Behavior Representation in Modeling and Simulation, pages 209–218, Springer.
  37. Recognizing Value Resonance with Resonance-Tuned RoBERTa Task Definition, Experimental Validation, and Robust Modeling. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 13688–13698, ELRA and ICCL, Torino, Italia.
  38. Cultural alignment in response to strategic organizational change: New considerations for a change framework. Journal of Managerial Issues, pages 474–490.
  39. Berger, Arthur Asa. 2004. Deconstructing travel: Cultural perspectives on tourism. Rowman Altamira.
  40. Genericskb: A knowledge base of generic statements. ArXiv preprint, abs/2005.00660.
  41. Bhatia, Mehar and Vered Shwartz. 2023. GD-COMET: A Geo-Diverse Commonsense Inference Model. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 7993–8001, Association for Computational Linguistics, Singapore.
  42. Seegull multilingual: a dataset of geo-culturally situated stereotypes. ArXiv preprint, abs/2403.05696.
  43. RecipeNLG: A cooking recipes dataset for semi-structured text generation. In Proceedings of the 13th International Conference on Natural Language Generation, pages 22–28, Association for Computational Linguistics, Dublin, Ireland.
  44. JASMINE: Arabic GPT Models for Few-Shot Learning. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 16721–16744, Association for Computational Linguistics, Singapore.
  45. Power to the people? opportunities and challenges for participatory ai. In Proceedings of the 2nd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization, EAAMO ’22, Association for Computing Machinery, New York, NY, USA.
  46. Bloom, BS. 1956. handbook i: cognitive domain. David McKay Company.
  47. Visual question answering for cultural heritage. In IOP Conference Series: Materials Science and Engineering, volume 949, page 012074, IOP Publishing.
  48. Towards region-aware bias evaluation metrics. ArXiv preprint, abs/2406.16152.
  49. Investigating Human Values in Online Communities.
  50. Knowledge representation for culturally competent personal robots: requirements, design principles, implementation, and assessment. International Journal of Social Robotics, 11:515–538.
  51. Incorporating Geo-Diverse Knowledge into Prompting for Increased Geographical Robustness in Object Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13515–13524.
  52. How culturally aware are vision-language models? ArXiv preprint, abs/2405.17475.
  53. How large language models can reshape collective intelligence. Nature Human Behaviour, 8(9):1643–1655.
  54. High-dimension human value representation in large language models. ArXiv preprint, abs/2404.07900.
  55. Cendol: Open instruction-tuned generative large language models for indonesian languages. ArXiv preprint, abs/2404.06138.
  56. Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334):183–186.
  57. C3bench: A comprehensive classical chinese understanding benchmark for large language models. ArXiv preprint, abs/2405.17732.
  58. Theory-grounded measurement of U.S. social stereotypes in English language models. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1276–1295, Association for Computational Linguistics, Seattle, United States.
  59. Bridging cultural nuances in dialogue agents through cultural value surveys. In Findings of the Association for Computational Linguistics: EACL 2024, pages 929–945, Association for Computational Linguistics, St. Julian’s, Malta.
  60. Cultural adaptation of recipes. Transactions of the Association for Computational Linguistics, 12:80–99.
  61. Exploring visual culture awareness in gpt-4v: A comprehensive probing. ArXiv preprint, abs/2402.06015.
  62. Assessing cross-cultural alignment between ChatGPT and human societies: An empirical study. In Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP), pages 53–67, Association for Computational Linguistics, Dubrovnik, Croatia.
  63. MultiPICo: Multilingual perspectivist irony corpus. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 16008–16021, Association for Computational Linguistics, Bangkok, Thailand.
  64. Center, Pew Research. 2022. Pew global attitudes survey. Accessed: 2022.
  65. Cetinic, Eva. 2022. The myth of culturally agnostic ai models. ArXiv preprint, abs/2211.15271.
  66. Sociocultural norm similarities and differences via situational alignment and explainable textual entailment. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 3548–3564, Association for Computational Linguistics, Singapore.
  67. Harmonizing global voices: Culturally-aware models for enhanced content moderation. ArXiv preprint, abs/2312.02401.
  68. Benchmarking Cognitive Domains for LLMs: Insights from Taiwanese Hakka Culture. ArXiv preprint, abs/2409.01556.
  69. Benchmarking llms for translating classical chinese poetry: Evaluating adequacy, fluency, and elegance. ArXiv preprint, abs/2408.09945.
  70. AlpaGasus: Training a Better Alpaca with Fewer Data. In The Twelfth International Conference on Learning Representations.
  71. Marked personas: Using natural language prompts to measure stereotypes in language models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1504–1532, Association for Computational Linguistics, Toronto, Canada.
  72. Culturalteaming: Ai-assisted interactive red-teaming for challenging llms’(lack of) multicultural knowledge. ArXiv preprint, abs/2404.06664.
  73. DALL-EVAL: probing the reasoning skills and social biases of text-to-image generation models. In IEEE/CVF International Conference on Computer Vision, ICCV 2023, Paris, France, October 1-6, 2023, pages 3020–3031, IEEE.
  74. The Echoes of Multilinguality: Tracing Cultural Value Shifts during Language Model Fine-tuning. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15042–15058, Association for Computational Linguistics, Bangkok, Thailand.
  75. GuyLingo: The Republic of Guyana creole corpora. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers), pages 792–798, Association for Computational Linguistics, Mexico City, Mexico.
  76. Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8440–8451, Association for Computational Linguistics, Online.
  77. "they are uncultured": Unveiling covert harms and social threats in llm generated conversations. ArXiv preprint, abs/2405.05378.
  78. Toward cultural bias evaluation datasets: The case of Bengali gender, religious, and national identity. In Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP), pages 68–83, Association for Computational Linguistics, Dubrovnik, Croatia.
  79. Davis, Ernest and Gary Marcus. 2015. Commonsense reasoning and commonsense knowledge in artificial intelligence. Communications of the ACM, 58(9):92–103.
  80. Chatgpt and the rise of large language models: the new ai-driven infodemic threat in public health. Frontiers in public health, 11:1166120.
  81. Masive: Open-ended affective state identification in english and spanish. ArXiv preprint, abs/2407.12196.
  82. Dehghan, Somaiyeh and Berrin Yanıkoğlu. 2024. Multi-domain Hate Speech Detection Using Dual Contrastive Learning and Paralinguistic Features. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 11745–11755, ELRA and ICCL, Torino, Italia.
  83. Research on Generating Cultural Relic Images Based on a Low-Rank Adaptive Diffusion Model. In Proceedings of the 2024 Guangdong-Hong Kong-Macao Greater Bay Area International Conference on Digital Economy and Artificial Intelligence, pages 629–634.
  84. Toxicity in chatgpt: Analyzing persona-assigned language models. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 1236–1270, Association for Computational Linguistics, Singapore.
  85. StereoKG: Data-driven knowledge graph construction for cultural knowledge and stereotypes. In Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH), pages 67–78, Association for Computational Linguistics, Seattle, Washington (Hybrid).
  86. Building socio-culturally inclusive stereotype resources with community engagement. In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023.
  87. GlobalWoZ: Globalizing MultiWoZ to Develop Multilingual Task-Oriented Dialogue Systems. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1639–1657, Association for Computational Linguistics, Dublin, Ireland.
  88. Chinese tiny llm: Pretraining a chinese-centric large language model. ArXiv preprint, abs/2404.04167.
  89. Alpacafarm: A simulation framework for methods that learn from human feedback. In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023.
  90. Enhancing expressivity transfer in textless speech-to-speech translation. In 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pages 1–8, IEEE.
  91. Towards measuring the representation of subjective global opinions in language models. ArXiv preprint, abs/2306.16388.
  92. EtiCor: Corpus for analyzing LLMs for etiquettes. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 6921–6931, Association for Computational Linguistics, Singapore.
  93. Toucan: Many-to-many translation for 150 african language pairs. ArXiv preprint, abs/2407.04796.
  94. Ember, Carol R. 2009. Cross-cultural research methods. Rowman Altamira.
  95. In what languages are generative language models the most formal? analyzing formality distribution across languages. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 2650–2666, Association for Computational Linguistics, Singapore.
  96. España-Bonet, Cristina and Alberto Barrón-Cedeño. 2022. The (undesired) attenuation of human biases by multilinguality. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 2056–2077, Association for Computational Linguistics, Abu Dhabi, United Arab Emirates.
  97. Bertaqa: How much do language models know about local culture? ArXiv preprint, abs/2406.07302.
  98. DIALECTBENCH: A NLP Benchmark for Dialects, Varieties, and Closely-Related Languages. ArXiv preprint, abs/2403.11009.
  99. Cichmkg: a large-scale and comprehensive chinese intangible cultural heritage multimodal knowledge graph. Heritage Science, 11:1–18.
  100. Towards artificial general intelligence via a multimodal foundation model. Nature Communications, 13(1):3094.
  101. Modular pluralism: Pluralistic alignment via multi-llm collaboration. ArXiv preprint, abs/2406.15951.
  102. Biased ai can influence political decision-making. ArXiv preprint, abs/2410.06415.
  103. Your stereotypical mileage may vary: Practical challenges of evaluating biases in multiple languages and cultural contexts. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 17764–17769, ELRA and ICCL, Torino, Italia.
  104. NORMSAGE: Multi-lingual multi-cultural norm discovery from conversations on-the-fly. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 15217–15230, Association for Computational Linguistics, Singapore.
  105. Massively multi-cultural knowledge acquisition & lm benchmarking. ArXiv preprint, abs/2402.09369.
  106. Multilingual dyadic interaction corpus noxi+ j: Toward understanding asian-european non-verbal cultural characteristics and their influences on engagement. ArXiv preprint, abs/2409.13726.
  107. Furst, Edward J. 1981. Bloom’s taxonomy of educational objectives for the cognitive domain: Philosophical and educational issues. Review of educational research, 51(4):441–453.
  108. Gabriel, Iason. 2020. Artificial Intelligence, Values, and Alignment. Minds and Machines, 30(3):411–437.
  109. Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned. ArXiv preprint, abs/2209.07858.
  110. RealToxicityPrompts: Evaluating neural toxic degeneration in language models. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3356–3369, Association for Computational Linguistics, Online.
  111. Khayyam challenge (persianmmlu): Is your llm truly wise to the persian language? ArXiv preprint, abs/2404.06644.
  112. Geneval: An object-focused framework for evaluating text-to-image alignment. In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023.
  113. Goldman, Josh and John K Tsotsos. 2024. Statistical challenges with dataset construction: Why you will never have enough images. ArXiv preprint, abs/2408.11160.
  114. Measuring individual differences in implicit cognition: the implicit association test. Journal of personality and social psychology, 74(6):1464.
  115. RuBia: A Russian language bias detection dataset. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 14227–14239, ELRA and ICCL, Torino, Italia.
  116. AI and the Transformation of Social Science Research. Science, 380(6650):1108–1109.
  117. Walledeval: A comprehensive safety evaluation toolkit for large language models. ArXiv preprint, abs/2408.03837.
  118. Haerpfer, Christian W and Kseniya Kizilova. 2012. The world values survey. The Wiley-Blackwell Encyclopedia of Globalization, pages 1–5.
  119. Halpern, Ben. 1955. The dynamic elements of culture. Ethics, 65:235 – 249.
  120. Mosaic: Finding artistic connections across culture with conditional image retrieval. In NeurIPS 2020 Competition and Demonstration Track, pages 133–155, PMLR.
  121. Speaking Multiple Languages Affects the Moral Bias of Language Models. In Findings of the Association for Computational Linguistics: ACL 2023, pages 2137–2156, Association for Computational Linguistics, Toronto, Canada.
  122. Bridging background knowledge gaps in translation with automatic explicitation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 9718–9735, Association for Computational Linguistics, Singapore.
  123. ToxiGen: A large-scale machine-generated dataset for adversarial and implicit hate speech detection. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3309–3326, Association for Computational Linguistics, Dublin, Ireland.
  124. Nativqa: Multilingual culturally-aligned natural query for llms. ArXiv preprint, abs/2407.09823.
  125. Building knowledge-guided lexica to model cultural variation. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 211–226, Association for Computational Linguistics, Mexico City, Mexico.
  126. Multilingual language models are not multicultural: A case study in emotion. In Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis, pages 202–214, Association for Computational Linguistics, Toronto, Canada.
  127. How far can we extract diverse perspectives from large language models? criteria-based diversity prompting! ArXiv preprint, abs/2311.09799.
  128. Chumor 1.0: A truly funny and challenging chinese humor understanding dataset from ruo zhi ba. ArXiv preprint, abs/2406.12754.
  129. Recent advances in hate speech moderation: Multimodality and the role of large models. ArXiv preprint, abs/2401.16727.
  130. Do Images really do the Talking? Analysing the significance of Images in Tamil Troll meme classification. ArXiv preprint, abs/2108.03886.
  131. Measuring massive multitask language understanding. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021, OpenReview.net.
  132. The weirdest people in the world? Behavioral and brain sciences, 33(2-3):61–83.
  133. Challenges and strategies in cross-cultural NLP. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6997–7013, Association for Computational Linguistics, Dublin, Ireland.
  134. Heylighen, Francis and Jean-Marc Dewaele. 1999. Formality of language: definition, measurement and behavioral determinants. Interner Bericht, Center “Leo Apostel”, Vrije Universiteit Brüssel, 4(1).
  135. Imagen video: High definition video generation with diffusion models. ArXiv preprint, abs/2210.02303.
  136. Hofstede, Geert. 2005. Culture’s recent consequences. In Designing for Global Markets 7, IWIPS 2005, Bridging Cultural Differences, 7-9 July 2005, Amsterdam, The Netherlands, Proceedings of the Seventh International Workshop on Internationalisation of Products and Systems, pages 3–4, Product & Systems Internationalisation, Inc.
  137. Cultures and organizations: Software of the mind, third edition, 3 edition. McGraw-Hill Professional, New York, NY.
  138. Holton, Robert. 2000. Globalization’s cultural consequences. The ANNALS of the American academy of political and social science, 570(1):140–152.
  139. Horton, John J. 2023. Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus? Working Paper 31122, National Bureau of Economic Research.
  140. Hovy, Dirk and Diyi Yang. 2021. The importance of modeling social factors of language: Theory and practice. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 588–602, Association for Computational Linguistics, Online.
  141. Hsiao, Wei-Lin and Kristen Grauman. 2021. From Culture to Clothing: Discovering the World Events Behind A Century of Fashion Images. In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pages 1046–1055, IEEE.
  142. Multi 3 WOZ: A multilingual, multi-domain, multi-parallel dataset for training and evaluating culturally adapted task-oriented dialog systems. Transactions of the Association for Computational Linguistics, 11:1396–1415.
  143. TIFA: accurate and interpretable text-to-image faithfulness evaluation with question answering. In IEEE/CVF International Conference on Computer Vision, ICCV 2023, Paris, France, October 1-6, 2023, pages 20349–20360, IEEE.
  144. AceGPT, localizing large language models in Arabic. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 8139–8163, Association for Computational Linguistics, Mexico City, Mexico.
  145. T2i-compbench: A comprehensive benchmark for open-world compositional text-to-image generation. In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023.
  146. Huang, Yufei and Deyi Xiong. 2024. CBBQ: A Chinese bias benchmark dataset curated with human-AI collaboration for large language models. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 2917–2929, ELRA and ICCL, Torino, Italia.
  147. Multi2WOZ: A Robust Multilingual Dataset and Conversational Pretraining for Task-Oriented Dialog. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3687–3703, Association for Computational Linguistics, Seattle, United States.
  148. Aligning language models to user opinions. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 5906–5919, Association for Computational Linguistics, Singapore.
  149. Cross-cultural inspiration detection and analysis in real and llm-generated social media data. ArXiv preprint, abs/2404.12933.
  150. The relation between language, culture, and thought. Current Opinion in Psychology, 8:70–77. Culture.
  151. Glot500: Scaling multilingual corpora and language models to 500 languages. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1082–1117, Association for Computational Linguistics, Toronto, Canada.
  152. Jahoda, Gustav and Harry McGurk. 1974. Pictorial depth perception: a developmental study. British journal of psychology, 65 1:141–9.
  153. Co-writing with opinionated language models affects users’ views. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, CHI 2023, Hamburg, Germany, April 23-28, 2023, pages 111:1–111:15, ACM.
  154. Co-Writing with Opinionated Language Models Affects Users’ Views. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, CHI 2023, Hamburg, Germany, April 23-28, 2023, pages 111:1–111:15, ACM.
  155. Gpt-4 can pass the korean national licensing examination for korean medicine doctors. PLOS Digital Health, 2(12):e0000416.
  156. Indicvoices: Towards building an inclusive multilingual speech dataset for indian languages. ArXiv preprint, abs/2403.01926.
  157. KOLD: Korean offensive language dataset. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 10818–10833, Association for Computational Linguistics, Abu Dhabi, United Arab Emirates.
  158. SeeGULL: A stereotype benchmark with broad geo-cultural coverage leveraging generative models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 9851–9870, Association for Computational Linguistics, Toronto, Canada.
  159. ViSAGe: A Global-Scale Analysis of Visual Stereotypes in Text-to-Image Generation. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 12333–12347.
  160. Bridging discrete and continuous: A multimodal strategy for complex emotion detection. ArXiv preprint, abs/2409.07901.
  161. Jiang, Ming and Mansi Joshi. 2024. CPopQA: Ranking cultural concept popularity by LLMs. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers), pages 615–630, Association for Computational Linguistics, Mexico City, Mexico.
  162. KoBBQ: Korean bias benchmark for question answering. Transactions of the Association for Computational Linguistics, 12:507–524.
  163. Jinnai, Yuu. 2024. Does cross-cultural alignment change the commonsense morality of language models? In Proceedings of the 2nd Workshop on Cross-Cultural Considerations in NLP, pages 48–64, Association for Computational Linguistics, Bangkok, Thailand.
  164. The ghost in the machine has an american accent: value conflict in gpt-3. ArXiv preprint, abs/2203.07785.
  165. Jones, Ruth and Ann Irvine. 2013. The (un)faithful machine translator. In Proceedings of the 7th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pages 96–101, Association for Computational Linguistics, Sofia, Bulgaria.
  166. The state and fate of linguistic diversity and inclusion in the NLP world. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6282–6293, Association for Computational Linguistics, Online.
  167. Visual affect around the world: A large-scale multilingual visual sentiment ontology. In Proceedings of the 23rd ACM international conference on Multimedia, pages 159–168.
  168. Multi-lingual and multi-cultural figurative language understanding. In Findings of the Association for Computational Linguistics: ACL 2023, pages 8269–8284, Association for Computational Linguistics, Toronto, Canada.
  169. TARJAMAT: Evaluation of Bard and ChatGPT on Machine Translation of Ten Arabic Varieties. In Proceedings of ArabicNLP 2023, pages 52–75, Association for Computational Linguistics, Singapore (Hybrid).
  170. Thorny roses: Investigating the dual use dilemma in natural language processing. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 13977–13998, Association for Computational Linguistics, Singapore.
  171. Beyond aesthetics: Cultural competence in text-to-image models. ArXiv preprint, abs/2407.06863.
  172. Quantifying the Dialect Gap and its Correlates Across Languages. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 7226–7245, Association for Computational Linguistics, Singapore.
  173. Chatgpt for good? on opportunities and challenges of large language models for education.
  174. Epistemic Injustice in Generative AI. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 7:684–697.
  175. Keleg, Amr and Walid Magdy. 2023. DLAMA: A framework for curating culturally diverse facts for probing the knowledge of pretrained language models. In Findings of the Association for Computational Linguistics: ACL 2023, pages 6245–6266, Association for Computational Linguistics, Toronto, Canada.
  176. Vyaktitv: A multimodal peer-to-peer hindi conversations based dataset for personality assessment. In 2020 IEEE sixth international conference on multimedia big data (BigMM), pages 103–111, IEEE.
  177. Indian-bhed: A dataset for measuring india-centric biases in large language models. In Proceedings of the 2024 International Conference on Information Technology for Social Good, pages 231–239.
  178. Indian-bhed: A dataset for measuring india-centric biases in large language models. In Proceedings of the 2024 International Conference on Information Technology for Social Good, GoodIT ’24, page 231–239, Association for Computing Machinery, New York, NY, USA.
  179. An image speaks a thousand words, but can everyone listen? on translating images for cultural relevance. ArXiv preprint, abs/2404.01247.
  180. What changes can large-scale language models bring? intensive study on HyperCLOVA: Billions-scale Korean generative pretrained transformers. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3405–3424, Association for Computational Linguistics, Online and Punta Cana, Dominican Republic.
  181. CLIcK: A Benchmark Dataset of Cultural and Linguistic Intelligence in Korean. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 3335–3346, ELRA and ICCL, Torino, Italia.
  182. K-pop Lyric Translation: Dataset, Analysis, and Neural-Modelling. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 9974–9987, ELRA and ICCL, Torino, Italia.
  183. How Do Moral Emotions Shape Political Participation? A Cross-Cultural Analysis of Online Petitions Using Language Models. In Findings of the Association for Computational Linguistics ACL 2024, pages 16274–16289, Association for Computational Linguistics, Bangkok, Thailand and virtual meeting.
  184. KpopMT: Translation Dataset with Terminology for Kpop Fandom. ArXiv preprint, abs/2407.07413.
  185. The Benefits, Risks and Bounds of Personalizing the Alignment of Large Language Models to Individuals. Nature Machine Intelligence, 6(4):383–392.
  186. From bytes to borsch: Fine-tuning gemma and mistral for the Ukrainian language representation. In Proceedings of the Third Ukrainian Natural Language Processing Workshop (UNLP) @ LREC-COLING 2024, pages 83–94, ELRA and ICCL, Torino, Italia.
  187. OpenAssistant Conversations - Democratizing Large Language Model Alignment. In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023.
  188. The Challenges of Creating a Parallel Multilingual Hate Speech Corpus: An Exploration. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 15842–15853, ELRA and ICCL, Torino, Italia.
  189. Sewa db: A rich database for audio-visual emotion and sentiment research in the wild. IEEE transactions on pattern analysis and machine intelligence, 43(3):1022–1040.
  190. Large language models only pass primary school exams in Indonesia: A comprehensive test on IndoMMLU. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 12359–12374, Association for Computational Linguistics, Singapore.
  191. ArabicMMLU: Assessing massive multitask language understanding in Arabic. In Findings of the Association for Computational Linguistics ACL 2024, pages 5622–5640, Association for Computational Linguistics, Bangkok, Thailand and virtual meeting.
  192. IndoCulture: Exploring Geographically-Influenced Cultural Commonsense Reasoning Across Eleven Indonesian Provinces. ArXiv preprint, abs/2404.01854.
  193. Kunda, Maithilee and Irina Rabkina. 2020. Creative captioning: An ai grand challenge based on the dixit board game. ArXiv preprint, abs/2010.00048.
  194. The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. International journal of computer vision, 128(7):1956–1981.
  195. When Do Pre-Training Biases Propagate to Downstream Tasks? A Case Study in Text Summarization. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 3206–3219, Association for Computational Linguistics, Dubrovnik, Croatia.
  196. Improving diversity of demographic representation in large language models via collective-critiques and self-voting. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 10383–10405, Association for Computational Linguistics, Singapore.
  197. Le, Thang and Anh Luu. 2023. A Parallel Corpus for Vietnamese Central-Northern Dialect Text Transfer. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 13839–13855, Association for Computational Linguistics, Singapore.
  198. KoSBI: A dataset for mitigating social bias risks towards safer large language model applications. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track), pages 208–224, Association for Computational Linguistics, Toronto, Canada.
  199. Kornat: Llm alignment benchmark for korean social values and common knowledge. ArXiv preprint, abs/2402.13605.
  200. Exploring cross-cultural differences in English hate speech annotations: From dataset construction to analysis. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 4205–4224, Association for Computational Linguistics, Mexico City, Mexico.
  201. Hate speech classifiers are culturally insensitive. In Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP), pages 35–46, Association for Computational Linguistics, Dubrovnik, Croatia.
  202. Leeb, Felix and Bernhard Schölkopf. 2024. A diverse multilingual news headlines dataset from around the world. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers), pages 647–652, Association for Computational Linguistics, Mexico City, Mexico.
  203. Bhasa: A holistic southeast asian linguistic and cultural evaluation suite for large language models. ArXiv preprint, abs/2309.06085.
  204. Genai-bench: A holistic benchmark for compositional text-to-visual generation. In Synthetic Data for Computer Vision Workshop@ CVPR 2024.
  205. Culturellm: Incorporating cultural differences into large language models. ArXiv preprint, abs/2402.10946.
  206. Culturepark: Boosting cross-cultural understanding in large language models. ArXiv preprint, abs/2405.15145.
  207. Translate the beauty in songs: Jointly learning to align melody and translate lyrics. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 27–39, Association for Computational Linguistics, Singapore.
  208. X-Instruction: Aligning Language Model in Low-resource Languages with Self-curated Cross-lingual Instructions. In Findings of the Association for Computational Linguistics ACL 2024, pages 546–566, Association for Computational Linguistics, Bangkok, Thailand and virtual meeting.
  209. CMMLU: Measuring massive multitask language understanding in Chinese. In Findings of the Association for Computational Linguistics ACL 2024, pages 11260–11285, Association for Computational Linguistics, Bangkok, Thailand and virtual meeting.
  210. Culture-gen: Revealing global cultural perception in language models through natural language prompting. ArXiv preprint, abs/2404.10199.
  211. How well do LLMs identify cultural unity in diversity? In First Conference on Language Modeling.
  212. NormDial: A comparable bilingual synthetic dialog dataset for modeling social norm adherence and violation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 15732–15744, Association for Computational Linguistics, Singapore.
  213. An Unsupervised Framework for Adaptive Context-aware Simplified-Traditional Chinese Character Conversion. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 1318–1326, ELRA and ICCL, Torino, Italia.
  214. FoundaBench: Evaluating Chinese Fundamental Knowledge Capabilities of Large Language Models. ArXiv preprint, abs/2404.18359.
  215. A survey of foundation models for music understanding. ArXiv preprint, abs/2409.09601.
  216. Foodieqa: A multimodal dataset for fine-grained understanding of chinese food culture. ArXiv preprint, abs/2406.11030.
  217. CIF-Bench: A Chinese Instruction-Following Benchmark for Evaluating the Generalizability of Large Language Models. ArXiv preprint, abs/2402.13109.
  218. A multi-modal knowledge graph for classical Chinese poetry. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 2318–2326, Association for Computational Linguistics, Abu Dhabi, United Arab Emirates.
  219. Li, Zhi and Yin Zhang. 2023. Cultural Concept Adaptation on Multimodal Reasoning. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 262–276, Association for Computational Linguistics, Singapore.
  220. From text to historical ecological knowledge: The construction and application of the Shan jing knowledge base. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 7521–7530, ELRA and ICCL, Torino, Italia.
  221. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755, Springer.
  222. Few-shot learning with multilingual language models. ArXiv preprint, abs/2112.10668.
  223. Lin, Yen-Ting and Yun-Nung Chen. 2023. Taiwan llm: Bridging the linguistic divide with a culturally aligned language model. ArXiv preprint, abs/2311.17487.
  224. On the cultural gap in text-to-image generation. ArXiv preprint, abs/2307.02971.
  225. FigMemes: A dataset for figurative language identification in politically-opinionated memes. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 7069–7086, Association for Computational Linguistics, Abu Dhabi, United Arab Emirates.
  226. Are multilingual LLMs culturally-diverse reasoners? an investigation into multicultural proverbs and sayings. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 2016–2039, Association for Computational Linguistics, Mexico City, Mexico.
  227. Culturally aware and adapted nlp: A taxonomy and a survey of the state of the art. ArXiv preprint, abs/2406.03930.
  228. Testing the ability of language models to interpret figurative language. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4437–4452, Association for Computational Linguistics, Seattle, United States.
  229. Visually grounded reasoning across languages and cultures. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 10467–10485, Association for Computational Linguistics, Online and Punta Cana, Dominican Republic.
  230. Logiqa 2.0—an improved dataset for logical reasoning in natural language understanding. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31:2947–2962.
  231. Liu, Meina. 2016. Verbal Communication Styles and Culture.
  232. Counterfactual recipe generation: Exploring compositional generalization in a realistic scenario. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 7354–7370, Association for Computational Linguistics, Abu Dhabi, United Arab Emirates.
  233. Towards a description of chinese morphemic concepts and semantic word-formation. Journal of Chinese Information Processing, 32(2):11–20.
  234. Omgeval: An open multilingual generative evaluation benchmark for large language models. ArXiv preprint, abs/2402.13524.
  235. Funnynet-w: Multimodal learning of funny moments in videos in the wild. International Journal of Computer Vision, pages 1–22.
  236. Towards Equitable Representation in Text-to-Image Synthesis Models with the Cross-Cultural Understanding Benchmark (CCUB) Dataset. ArXiv preprint, abs/2301.12073.
  237. Not always about you: Prioritizing community needs when developing endangered language technology. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3933–3944, Association for Computational Linguistics, Dublin, Ireland.
  238. CCEval: A representative evaluation benchmark for the Chinese-centric multilingual machine translation. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 10176–10184, Association for Computational Linguistics, Singapore.
  239. Seacrowd: A multilingual multimodal data hub and benchmark suite for southeast asian languages. ArXiv preprint, abs/2406.10118.
  240. Subverting machines, fluctuating identities: Re-learning human categorization. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’22, page 1005–1015, Association for Computing Machinery, New York, NY, USA.
  241. Gpt-4v (ision) as a social media analysis engine. ArXiv preprint, abs/2311.07547.
  242. EnCBP: A new benchmark dataset for finer-grained cultural background prediction in English. In Findings of the Association for Computational Linguistics: ACL 2022, pages 2811–2823, Association for Computational Linguistics, Dublin, Ireland.
  243. Food-500 cap: A fine-grained food caption benchmark for evaluating vision-language models. Proceedings of the 31st ACM International Conference on Multimedia.
  244. You are what you eat? Feeding foundation models a regionally diverse food dataset of World Wide Dishes. ArXiv preprint, abs/2406.09496.
  245. Cross-lingual dialogue dataset creation via outline-based generation. Transactions of the Association for Computational Linguistics, 11:139–156.
  246. Fairylandai: Personalized fairy tales utilizing chatgpt and dalle-3. ArXiv preprint, abs/2407.09467.
  247. Large language models are geographically biased. In Proceedings of the 41st International Conference on Machine Learning, volume 235 of Proceedings of Machine Learning Research, pages 34654–34669, PMLR.
  248. OK-VQA: A visual question answering benchmark requiring external knowledge. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pages 3195–3204, Computer Vision Foundation / IEEE.
  249. Listening to Affected Communities to Define Extreme Speech: Dataset and Experiments. In Findings of the Association for Computational Linguistics: ACL 2022, pages 1089–1104, Association for Computational Linguistics, Dublin, Ireland.
  250. "vorbeşti româneşte?" a recipe to train powerful romanian llms with english instructions. ArXiv preprint, abs/2406.18266.
  251. " vorbe\\\backslash\c {{\{{s}}\}} ti rom\\\backslash\^ ane\\\backslash\c {{\{{s}}\}} te?" a recipe to train powerful romanian llms with english instructions. ArXiv preprint, abs/2406.18266.
  252. Cultural alignment in large language models: An explanatory analysis based on hofstede’s cultural dimensions. ArXiv preprint, abs/2309.12342.
  253. Towards intercultural affect recognition: Audio-visual affect recognition in the wild across six cultures. In 2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG), pages 1–6, IEEE.
  254. Matsumoto, David. 2007. Culture, context, and behavior. Journal of personality, 75(6):1285–1320.
  255. Mausam. 2016. Open information extraction systems and downstream applications. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, USA, 9-15 July 2016, pages 4074–4077, IJCAI/AAAI Press.
  256. Localvaluebench: A collaboratively built and extensible benchmark for evaluating localized value alignment and ethical safety in large language models. ArXiv preprint, abs/2408.01460.
  257. Menadue, Christopher Benjamin and Karen Diane Cheer. 2017. Human culture and science fiction: A review of the literature, 1980-2016. Sage Open, 7(3):2158244017723690.
  258. Modelling the influence of cultural information on vision-based human home activity recognition. In 2017 14th international conference on ubiquitous robots and ambient intelligence (URAI), pages 32–38, IEEE.
  259. Disce aut deficere: Evaluating llms proficiency on the invalsi italian benchmark. ArXiv preprint, abs/2406.17535.
  260. Detecting hofstede cultural dimensions. Emotion, Personality and Cultural Aspects in Crowds: Towards a Geometrical Mind, pages 93–103.
  261. ISIA food-500: A dataset for large-scale food recognition via stacked global-local attention network. In MM ’20: The 28th ACM International Conference on Multimedia, Virtual Event / Seattle, WA, USA, October 12-16, 2020, pages 393–401.
  262. A Robot Walks into a Bar: Can Language Models Serve as Creativity SupportTools for Comedy? An Evaluation of LLMs’ Humour Alignment with Comedians. In Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’24, page 1622–1636, Association for Computing Machinery, New York, NY, USA.
  263. Global-liar: Factuality of llms over time and geographic regions. ArXiv preprint, abs/2401.17839.
  264. Navigating text-to-image generative bias across indic languages. ArXiv preprint, abs/2408.00283.
  265. NormMark: A weakly supervised Markov model for socio-cultural norm discovery. In Findings of the Association for Computational Linguistics: ACL 2023, pages 5081–5089, Association for Computational Linguistics, Toronto, Canada.
  266. ArtELingo: A million emotion annotations of WikiArt with emphasis on diversity over language and culture. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 8770–8785, Association for Computational Linguistics, Abu Dhabi, United Arab Emirates.
  267. Mohammad, Saif and Svetlana Kiritchenko. 2018. WikiArt emotions: An annotated dataset of emotions evoked by art. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), European Language Resources Association (ELRA), Miyazaki, Japan.
  268. Advancing cultural inclusivity: Optimizing embedding spaces for balanced music recommendations. ArXiv preprint, abs/2405.17607.
  269. Moseley, Christopher. 2010. Atlas of the World’s Languages in Danger. Unesco.
  270. Aradice: Benchmarks for dialectal and cultural capabilities in llms. ArXiv preprint, abs/2409.11404.
  271. Global Gallery: The Fine Art of Painting Culture Portraits through Multilingual Instruction Tuning. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 6398–6415, Association for Computational Linguistics, Mexico City, Mexico.
  272. Global Voices, local biases: Socio-cultural prejudices across languages. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 15828–15845, Association for Computational Linguistics, Singapore.
  273. Cultural conditioning or placebo? on the effectiveness of socio-demographic prompting. ArXiv preprint, abs/2406.11661.
  274. BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and Languages. ArXiv preprint, abs/2406.09948.
  275. Blend: A benchmark for llms on everyday knowledge in diverse cultures and languages. ArXiv preprint, abs/2406.09948.
  276. CoCoA-MT: A dataset and benchmark for contrastive controlled MT with application to formality. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 616–632, Association for Computational Linguistics, Seattle, United States.
  277. CrowS-pairs: A challenge dataset for measuring social biases in masked language models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1953–1967, Association for Computational Linguistics, Online.
  278. Having Beer after Prayer? Measuring Cultural Bias in Large Language Models. ArXiv preprint, abs/2305.14456.
  279. Bias neutralization framework: Measuring fairness in large language models with bias intelligence quotient (biq). ArXiv preprint, abs/2404.18276.
  280. Biases in Large Language Models: Origins, Inventory, and Discussion. J. Data and Information Quality, 15(2).
  281. Benchmarking vision language models for cultural understanding. ArXiv preprint, abs/2407.10920.
  282. Naylor, Larry. 1996. Culture and change: An introduction. Bloomsbury Publishing USA.
  283. Intra-, inter-, and cross-cultural classification of vocal affect. In Interspeech 2011, pages 1581–1584, ISCA.
  284. Mbbq: A dataset for cross-lingual comparison of stereotypes in generative llms. ArXiv preprint, abs/2406.07243.
  285. French CrowS-pairs: Extending a challenge dataset for measuring social bias in masked language models to a language other than English. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8521–8531, Association for Computational Linguistics, Dublin, Ireland.
  286. Newmark, Peter. 2003. A textbook of translation.
  287. CulturaX: A cleaned, enormous, and multilingual dataset for large language models in 167 languages. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 4226–4237, ELRA and ICCL, Torino, Italia.
  288. Extracting cultural commonsense knowledge at scale. In Proceedings of the ACM Web Conference 2023, WWW 2023, Austin, TX, USA, 30 April 2023 - 4 May 2023, pages 1907–1917, ACM.
  289. Extracting Cultural Commonsense Knowledge at Scale. In Proceedings of the ACM Web Conference 2023, WWW 2023, Austin, TX, USA, 30 April 2023 - 4 May 2023, pages 1907–1917, ACM.
  290. Multi-cultural commonsense knowledge distillation. ArXiv preprint, abs/2402.10689.
  291. SeaLLMs–Large Language Models for Southeast Asia. ArXiv preprint, abs/2312.00738.
  292. OpenITI: A Machine-Readable Corpus of Islamicate Texts (2021.2. 5)[Data Set].
  293. Nisbett, Richard E. and Takahiko Masuda. 2003. Culture and point of view. Proceedings of the National Academy of Sciences of the United States of America, 100:11163 – 11170.
  294. Universal Dependencies. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Tutorial Abstracts, Association for Computational Linguistics, Valencia, Spain.
  295. Nordhoff, Sebastian. Linked Data for Linguistic Diversity Research: Glottolog/Langdoc and ASJP Online. In Linked Data in Linguistics: Representing and Connecting Language Data and Language Metadata. Springer, pages 191–200.
  296. Survey on emotional body gesture recognition. IEEE transactions on affective computing, 12(2):505–523.
  297. Beyond Metrics: Evaluating LLMs’ Effectiveness in Culturally Nuanced, Low-Resource Real-World Scenarios. ArXiv preprint, abs/2406.00343.
  298. Ofer, Dan and Dafna Shahaf. 2022. Cards against AI: Predicting humor in a fill-in-the-blank party game. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 5397–5403, Association for Computational Linguistics, Abu Dhabi, United Arab Emirates.
  299. OpenAI. ChatGPT. https://chat.openai.com/.
  300. OpenAI. 2023. GPT-4 Technical Report. ArXiv preprint, abs/2303.08774.
  301. Komodo: A Linguistic Expedition into Indonesia’s Regional Languages. ArXiv preprint, abs/2403.09362.
  302. Towards cross-lingual explanation of artwork in large-scale vision language models. ArXiv preprint, abs/2409.01584.
  303. Does Writing with Language Models Reduce Content Diversity? ArXiv preprint, abs/2309.05196.
  304. Palta, Shramay and Rachel Rudinger. 2023. FORK: A bite-sized test set for probing culinary cultural biases in commonsense reasoning models. In Findings of the Association for Computational Linguistics: ACL 2023, pages 9952–9962, Association for Computational Linguistics, Toronto, Canada.
  305. Pandya, Keivalya and Mehfuza Holia. 2023. Automating Customer Service using LangChain: Building custom open-source GPT Chatbot for organizations. ArXiv preprint, abs/2310.05421.
  306. Multilingual visual sentiment concept matching. In Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, pages 151–158.
  307. HaVQA: A Dataset for Visual Question Answering and Multimodal Research in Hausa Language. In Findings of the Association for Computational Linguistics: ACL 2023, pages 10162–10183, Association for Computational Linguistics, Toronto, Canada.
  308. Social simulacra: Creating populated prototypes for social computing systems. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology, UIST ’22, Association for Computing Machinery, New York, NY, USA.
  309. BBQ: A hand-built bias benchmark for question answering. In Findings of the Association for Computational Linguistics: ACL 2022, pages 2086–2105, Association for Computational Linguistics, Dublin, Ireland.
  310. Red teaming language models with language models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 3419–3448, Association for Computational Linguistics, Abu Dhabi, United Arab Emirates.
  311. Peters, Heinrich and Sandra C Matz. 2024. Large Language Models Can Infer Psychological Dispositions of Social Media Users. PNAS Nexus, 3(6):pgae231.
  312. Peterson, Sharyl Bender and Mary Alyce Lach. 1990. Gender stereotypes in children’s books: their prevalence and influence on cognitive and affective development. Gender and Education, 2(2):185–197.
  313. Picca, Davide and John Pavlopoulos. 2024. Deciphering emotional landscapes in the Iliad: A novel French-annotated dataset for emotion recognition. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 4462–4467, ELRA and ICCL, Torino, Italia.
  314. Typhoon: Thai large language models. ArXiv preprint, abs/2312.13951.
  315. Sabiá: Portuguese large language models. In Brazilian Conference on Intelligent Systems, pages 226–240, Springer.
  316. Civics: Building a dataset for examining culturally-informed values in large language models. ArXiv preprint, abs/2405.13974.
  317. GRASP: A Disagreement Analysis Framework to Assess Group Associations in Perspectives. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 3473–3492, Association for Computational Linguistics, Mexico City, Mexico.
  318. Cultural Incongruencies in Artificial Intelligence. ArXiv preprint, abs/2211.13069.
  319. Detecting harmful memes and their targets. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 2783–2796, Association for Computational Linguistics, Online.
  320. Can LLM Generate Culturally Relevant Commonsense QA Data? Case Study in Indonesian and Sundanese. ArXiv preprint, abs/2402.17302.
  321. Can llm generate culturally relevant commonsense qa data? case study in indonesian and sundanese. ArXiv preprint, abs/2402.17302.
  322. Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pages 8748–8763, PMLR.
  323. AART: AI-assisted red-teaming with diverse data generation for new LLM-powered applications. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 380–395, Association for Computational Linguistics, Singapore.
  324. Direct preference optimization: Your language model is secretly a reward model. In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023.
  325. A cross-cultural analysis of social norms in bollywood and hollywood movies. ArXiv preprint, abs/2402.11333.
  326. Hierarchical text-conditional image generation with clip latents. ArXiv preprint, abs/2204.06125.
  327. Ramezani, Aida and Yang Xu. 2023. Knowledge of cultural moral norms in large language models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 428–446, Association for Computational Linguistics, Toronto, Canada.
  328. Ramponi, Alan. 2024. Language varieties of Italy: Technology challenges and opportunities. Transactions of the Association for Computational Linguistics, 12:19–38.
  329. Normad: A benchmark for measuring the cultural adaptability of large language models. ArXiv preprint, abs/2404.12464.
  330. Generating diverse high-fidelity images with VQ-VAE-2. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 14837–14847.
  331. Creating standardized video recordings of multimodal interactions across cultures. pages 138–159.
  332. Avec 2019 workshop and challenge: state-of-mind, detecting depression with ai, and cross-cultural affect recognition. In Proceedings of the 9th International on Audio/visual Emotion Challenge and Workshop, pages 3–12.
  333. Hate-Speech and Offensive Language Detection in Roman Urdu. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2512–2522, Association for Computational Linguistics, Online.
  334. Choice of plausible alternatives: An evaluation of commonsense causal reasoning. In 2011 AAAI spring symposium series.
  335. The idioms and culture-specific items translation strategy for a classic novel. Journey: Journal of English Language and Pedagogy, 5(2):169–181.
  336. The dollar street dataset: Images representing the geographic and socioeconomic diversity of the world. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022.
  337. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695.
  338. CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark. ArXiv preprint, abs/2406.05967.
  339. DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24, 2023, pages 22500–22510, IEEE.
  340. Photorealistic text-to-image diffusion models with deep language understanding. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022.
  341. IndiBias: A benchmark dataset to measure social biases in language models for Indian context. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 8786–8806, Association for Computational Linguistics, Mexico City, Mexico.
  342. A rose by any other name would not smell as sweet: Social bias in names mistranslation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 3933–3945, Association for Computational Linguistics, Singapore.
  343. Revisiting the classics: A study on identifying and rectifying gender stereotypes in rhymes and poems. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 14092–14102, ELRA and ICCL, Torino, Italia.
  344. Whose opinions do language models reflect? In International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, pages 29971–30004, PMLR.
  345. Annotators with attitudes: How annotator beliefs and identities bias toxic language detection. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5884–5906, Association for Computational Linguistics, Seattle, United States.
  346. Sapinski, Tomasz and Dorota Kamińska. 2015. Emotion recognition from natural speech – emotional profiles.
  347. Latvian national corpora collection – korpuss.lv. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 5123–5129, European Language Resources Association, Marseille, France.
  348. Saxon, Michael and William Yang Wang. 2023. Multilingual conceptual coverage in text-to-image models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4831–4848, Association for Computational Linguistics, Toronto, Canada.
  349. BLOOM: A 176B-Parameter Open-Access Multilingual Language Model. Working paper or preprint.
  350. Schneider, Florian and Sunayana Sitaram. 2024. M5 – a diverse benchmark to assess the performance of large multimodal models across multilingual and multicultural vision-language tasks. ArXiv preprint, abs/2407.03791.
  351. Schramm, W. 1954. How Communication Works, volume 586. The process and effects of mass communication/University of Illinois Press.
  352. LAION-5B: an open large-scale dataset for training next generation image-text models. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022.
  353. Schwartz, Shalom H. 2012. An overview of the schwartz theory of basic values. Online readings in Psychology and Culture, 2(1):11.
  354. Modelling human factors in perceptual multimedia quality: On the role of personality and culture. Proceedings of the 23rd ACM international conference on Multimedia.
  355. The influence of culture on visual perception.
  356. Jais and jais-chat: Arabic-centric foundation and instruction-tuned open generative large language models. ArXiv preprint, abs/2308.16149.
  357. DOSA: A Dataset of Social Artifacts from Different Indian Geographical Subcultures. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 5323–5337, ELRA and ICCL, Torino, Italia.
  358. Shah, Priyanshi and Ziad Kobti. 2020. Multimodal fake news detection using a cultural algorithm with situational and normative knowledge. 2020 IEEE Congress on Evolutionary Computation (CEC), pages 1–7.
  359. Modeling cross-cultural pragmatic inference with codenames duet. In Findings of the Association for Computational Linguistics: ACL 2023, pages 6550–6569, Association for Computational Linguistics, Toronto, Canada.
  360. Detecting and understanding harmful memes: A survey. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI 2022, Vienna, Austria, 23-29 July 2022, pages 5597–5606, ijcai.org.
  361. Understanding the capabilities and limitations of large language models for cultural commonsense. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 5668–5680, Association for Computational Linguistics, Mexico City, Mexico.
  362. Social Norms-Grounded Machine Ethics in Complex Narrative Situation. In Proceedings of the 29th International Conference on Computational Linguistics, pages 1333–1343, International Committee on Computational Linguistics, Gyeongju, Republic of Korea.
  363. Culturebank: An online community-driven knowledge base towards culturally aware language technologies. ArXiv preprint, abs/2404.15238.
  364. Shifman, Limor. 2013. Memes in digital culture. MIT press.
  365. AI models collapse when trained on recursively generated data. Nature, 631(8022):755–759.
  366. Shwartz, Vered. 2022. Good night at 4 pm?! time expressions in different cultures. In Findings of the Association for Computational Linguistics: ACL 2022, pages 2842–2853, Association for Computational Linguistics, Dublin, Ireland.
  367. Singh, Akshay and Rahul Thakur. 2024. Generalizable Multilingual Hate Speech Detection on Low Resource Indian Languages using Fair Selection in Federated Learning. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 7211–7221, Association for Computational Linguistics, Mexico City, Mexico.
  368. Kmmlu: Measuring massive multitask language understanding in korean. ArXiv preprint, abs/2402.11548.
  369. HAE-RAE Bench: Evaluation of Korean Knowledge in Language Models. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 7993–8007, ELRA and ICCL, Torino, Italia.
  370. The typing cure: Experiences with large language model chatbots for mental health support. ArXiv preprint, abs/2401.14362.
  371. Conceptnet 5.5: An open multilingual graph of general knowledge. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA, pages 4444–4451, AAAI Press.
  372. WIT: wikipedia-based image text dataset for multimodal multilingual machine learning. In SIGIR ’21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, July 11-15, 2021, pages 2443–2449, ACM.
  373. Artpedia: A new visual-semantic dataset with visual and contextual sentences in the artistic domain. In Image Analysis and Processing–ICIAP 2019: 20th International Conference, Trento, Italy, September 9–13, 2019, Proceedings, Part II 20, pages 729–740, Springer.
  374. Exploiting cultural biases via homoglyphs in text-to-image synthesis. Journal of Artificial Intelligence Research, 78:1017–1068.
  375. Survey, World Values. 2022. World values survey.
  376. ThatiAR: Subjectivity Detection in Arabic News Sentences. ArXiv preprint, abs/2406.05559.
  377. CommonsenseQA: A question answering challenge targeting commonsense knowledge. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4149–4158, Association for Computational Linguistics, Minneapolis, Minnesota.
  378. CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4149–4158, Association for Computational Linguistics, Minneapolis, Minnesota.
  379. MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering. ArXiv preprint, abs/2405.11985.
  380. CHisIEC: An information extraction corpus for Ancient Chinese history. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 3192–3202, ELRA and ICCL, Torino, Italia.
  381. Cultural bias and cultural alignment of large language models. PNAS nexus, 3(9):pgae346.
  382. Exploring document-level literary machine translation with parallel paragraphs from world literature. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 9882–9902, Association for Computational Linguistics, Abu Dhabi, United Arab Emirates.
  383. Crossmodal-3600: A massively multilingual multimodal evaluation dataset. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 715–729, Association for Computational Linguistics, Abu Dhabi, United Arab Emirates.
  384. A dataset for metaphor detection in early medieval Hebrew poetry. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers), pages 443–453, Association for Computational Linguistics, St. Julian’s, Malta.
  385. EthioLLM: Multilingual large language models for Ethiopian languages with task evaluation. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 6341–6352, ELRA and ICCL, Torino, Italia.
  386. From languages to geographies: Towards evaluating cultural bias in hate speech datasets. In Proceedings of the 8th Workshop on Online Abuse and Harms (WOAH 2024), pages 283–311, Association for Computational Linguistics, Mexico City, Mexico.
  387. Toral, Antonio and Andy Way. 2015. Machine-assisted translation of literary text: A case study. Translation Spaces, 4(2):240–267.
  388. Törnberg, Petter. 2024. How to Use Large-Language Models for Text Analysis. London.
  389. Are fairy tales fair? analyzing gender bias in temporal narrative event chains of children’s fairy tales. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6509–6531, Association for Computational Linguistics, Toronto, Canada.
  390. Uccix: Irish-excellence large language model. ArXiv preprint, abs/2405.13010.
  391. Personalized adaptation with pre-trained speech encoders for continuous emotion recognition. In 24th Annual Conference of the International Speech Communication Association, Interspeech 2023, Dublin, Ireland, August 20-24, 2023, pages 636–640, ISCA.
  392. Detecting Cybercrimes in Accordance with Pakistani Law: Dataset and Evaluation Using PLMs. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 4717–4728, ELRA and ICCL, Torino, Italia.
  393. Aya model: An instruction finetuned open-access multilingual language model. ArXiv preprint, abs/2402.07827.
  394. Navigating Cultural Chasms: Exploring and Unlocking the Cultural POV of Text-To-Image Models. ArXiv preprint, abs/2310.01929.
  395. A treebank of Asia minor Greek. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 1715–1721, ELRA and ICCL, Torino, Italia.
  396. Measuring algorithmically infused societies. Nature, 595(7866):197–204.
  397. Universals and cultural differences in forming personality trait judgments from faces. Social Psychological and Personality Science, 2(6):609–617.
  398. Sonnet or not, bot? poetry evaluation for large models and datasets. ArXiv preprint, abs/2406.18906.
  399. Large language models should not replace human participants because they can misportray and flatten identity groups. ArXiv preprint, abs/2402.01908.
  400. Craft: Extracting and tuning cultural instructions from the wild. ArXiv preprint, abs/2405.03138.
  401. SeaEval for multilingual foundation models: From cross-lingual alignment to cultural reasoning. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 370–390, Association for Computational Linguistics, Mexico City, Mexico.
  402. Multihateclip: A multilingual benchmark dataset for hateful video detection on youtube and bilibili. ArXiv preprint, abs/2408.03468.
  403. Not all countries celebrate thanksgiving: On the cultural dominance in large language models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6349–6384, Association for Computational Linguistics, Bangkok, Thailand.
  404. CMB: A comprehensive medical benchmark in Chinese. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 6184–6205, Association for Computational Linguistics, Mexico City, Mexico.
  405. Super-NaturalInstructions: Generalization via declarative instructions on 1600+ NLP tasks. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 5085–5109, Association for Computational Linguistics, Abu Dhabi, United Arab Emirates.
  406. Cdeval: A benchmark for measuring the cultural dimensions of large language models. ArXiv preprint, abs/2311.16421.
  407. Stream: social data and knowledge collective intelligence platform for training ethical ai models. AI & SOCIETY, pages 1–9.
  408. Cvlue: A new benchmark dataset for chinese vision-language understanding evaluation.
  409. Humanoid agents: Platform for simulating human-like generative agents. arXiv preprint arXiv:2310.05418.
  410. PARIKSHA: A Large-Scale Investigation of Human-LLM Evaluator Agreement on Multilingual and Multi-Cultural Data. CoRR.
  411. WAUGH, LINDA R. 1982. Marked and unmarked: A choice between unequals in semiotic structure. Semiotica, 38(3-4):299–318.
  412. Weber, James and Michael J Urick. 2017. Examining the millennials’ ethical profile: Assessing demographic variations in their personal value orientations. Business and Society Review, 122(4):469–506.
  413. Wordscape: a pipeline to extract multilingual, visually rich documents with layout annotations from web crawl data. In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023.
  414. Muchomusic: Evaluating music understanding in multimodal audio-language models. ArXiv preprint, abs/2408.01337.
  415. Ac-eval: Evaluating ancient chinese language understanding in large language models. ArXiv preprint, abs/2403.06574.
  416. Sociotechnical safety evaluation of generative ai systems. ArXiv preprint, abs/2310.11986.
  417. White, Leslie A. 1959. The concept of culture. American Anthropologist, 61(2):227–251.
  418. COPAL-ID: Indonesian Language Reasoning with Local Culture and Nuances. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 1404–1422, Association for Computational Linguistics, Mexico City, Mexico.
  419. Revealing fine-grained values and opinions in large language models. ArXiv preprint, abs/2406.19238.
  420. (perhaps) beyond human translation: Harnessing multi-agent collaboration for translating ultra-long literary texts. ArXiv preprint, abs/2405.11804.
  421. Wunarso, Novita Belinda and Yustinus Eko Soelistio. 2017. Towards indonesian speech-emotion automatic recognition (i-spear). In 2017 4th International Conference on New Media Studies (CONMEDIA), pages 98–101, IEEE.
  422. Rtp-lx: Can llms evaluate toxicity in multilingual scenarios? ArXiv preprint, abs/2404.14397.
  423. Würtz, Elizabeth. 2017. Intercultural Communication on Web sites: a Cross-Cultural Analysis of Web sites from High-Context Cultures and Low-Context Cultures. Journal of Computer-Mediated Communication, 11(1):274–299.
  424. Multimodal Cross-Lingual Features and Weight Fusion for Cross-Cultural Humor Detection. In Proceedings of the 4th on Multimodal Sentiment Analysis Challenge and Workshop: Mimicked Emotions, Humour and Personalisation, MuSe ’23, pages 51–57, Association for Computing Machinery, New York, NY, USA.
  425. When search engine services meet large language models: Visions and challenges. IEEE Transactions on Services Computing.
  426. Attngan: Fine-grained text to image generation with attentional generative adversarial networks. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pages 1316–1324, IEEE Computer Society.
  427. mT5: A massively multilingual pre-trained text-to-text transformer. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 483–498, Association for Computational Linguistics, Online.
  428. Analyzing social biases in japanese large language models. ArXiv preprint, abs/2406.02050.
  429. Social Skill Training with Large Language Models. ArXiv preprint, abs/2404.04204.
  430. Benchmarking llm-based machine translation on cultural awareness. ArXiv preprint, abs/2305.14328.
  431. Value FULCRA: Mapping large language models to the multidimensional spectrum of basic human value. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 8762–8785, Association for Computational Linguistics, Mexico City, Mexico.
  432. GOLEM: GOld standard for learning and evaluation of motifs. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 7801–7813, ELRA and ICCL, Torino, Italia.
  433. Computer vision datasets and models exhibit cultural and linguistic diversity in perception.
  434. Altdiffusion: A multilingual text-to-image diffusion model. In Thirty-Eighth AAAI Conference on Artificial Intelligence, AAAI 2024, Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence, IAAI 2024, Fourteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2014, February 20-27, 2024, Vancouver, Canada, pages 6648–6656, AAAI Press.
  435. GeoMLAMA: Geo-diverse commonsense probing on multilingual pre-trained language models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 2039–2055, Association for Computational Linguistics, Abu Dhabi, United Arab Emirates.
  436. GIVL: improving geographical inclusivity of vision-language models with pre-training methods. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24, 2023, pages 10951–10961, IEEE.
  437. Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 2115–2129, Association for Computational Linguistics, Online and Punta Cana, Dominican Republic.
  438. Chinese morpheme-informed evaluation of large language models. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 3165–3178, ELRA and ICCL, Torino, Italia.
  439. Should we respect llms? a cross-lingual study on the influence of prompt politeness on llm performance. ArXiv preprint, abs/2402.14531.
  440. Hyperclova x technical report. ArXiv preprint, abs/2404.01954.
  441. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Transactions of the Association for Computational Linguistics, 2:67–78.
  442. CMoralEval: A Moral Evaluation Benchmark for Chinese Large Language Models. In Findings of the Association for Computational Linguistics ACL 2024, pages 11817–11837, Association for Computational Linguistics, Bangkok, Thailand and virtual meeting.
  443. Frechet inception distance (fid) for evaluating gans. China University of Mining Technology Beijing Graduate School, 3.
  444. Measuring Social Norms of Large Language Models. In Findings of the Association for Computational Linguistics: NAACL 2024, pages 650–699, Association for Computational Linguistics, Mexico City, Mexico.
  445. Yun, Youngsik and Jihie Kim. 2024. Cic: A framework for culturally-aware image captioning. ArXiv preprint, abs/2402.05374.
  446. Turkishmmlu: Measuring massive multitask language understanding in turkish. ArXiv preprint, abs/2407.12402.
  447. SWAG: A large-scale adversarial dataset for grounded commonsense inference. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 93–104, Association for Computational Linguistics, Brussels, Belgium.
  448. RENOVI: A benchmark towards remediating norm violations in socio-cultural conversations. In Findings of the Association for Computational Linguistics: NAACL 2024, pages 3104–3117, Association for Computational Linguistics, Mexico City, Mexico.
  449. MC2: Towards transparent and culturally-aware NLP for minority languages in China. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8832–8850, Association for Computational Linguistics, Bangkok, Thailand.
  450. Speechagents: Human-communication simulation with multi-modal multi-agent systems. ArXiv preprint, abs/2401.03945.
  451. Can vision-language models be a good guesser? exploring vlms for times and location reasoning. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 636–645.
  452. Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pages 5908–5916, IEEE Computer Society.
  453. Partiality and Misconception: Investigating Cultural Representativeness in Text-to-Image Models. In Proceedings of the CHI Conference on Human Factors in Computing Systems, CHI 2024, Honolulu, HI, USA, May 11-16, 2024, pages 620:1–620:25, ACM.
  454. Aligning vision models with human aesthetics in retrieval: Benchmarks and algorithms. ArXiv preprint, abs/2406.09397.
  455. Cultiverse: Towards cross-cultural understanding for paintings with large language model. ArXiv preprint, abs/2405.00435.
  456. Methodology of adapting large english language models for specific cultural contexts. ArXiv preprint, abs/2406.18192.
  457. M3exam: A multilingual, multimodal, multilevel benchmark for examining large language models. In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023.
  458. Interpreting themes from educational stories. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 9190–9203, ELRA and ICCL, Torino, Italia.
  459. M6-ufc: Unifying multi-modal controls for conditional image synthesis via non-autoregressive generative transformers. ArXiv preprint, abs/2105.14211.
  460. CHBias: Bias evaluation and mitigation of Chinese conversational language models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13538–13556, Association for Computational Linguistics, Toronto, Canada.
  461. M3ED: Multi-modal multi-scene multi-label emotional dialogue database. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5699–5710, Association for Computational Linguistics, Dublin, Ireland.
  462. WorldValuesBench: A Large-Scale Benchmark Dataset for Multi-Cultural Value Awareness of Language Models. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 17696–17706, ELRA and ICCL, Torino, Italia.
  463. Judging llm-as-a-judge with mt-bench and chatbot arena. In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023.
  464. Beyond Preferences in AI Alignment. ArXiv preprint, abs/2408.16984.
  465. WYWEB: A NLP evaluation benchmark for classical Chinese. In Findings of the Association for Computational Linguistics: ACL 2023, pages 3294–3319, Association for Computational Linguistics, Toronto, Canada.
  466. Cultural compass: Predicting transfer learning success in offensive language detection with cultural features. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 12684–12702, Association for Computational Linguistics, Singapore.
  467. Cultural Compass: Predicting Transfer Learning Success in Offensive Language Detection with Cultural Features. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 12684–12702, Association for Computational Linguistics, Singapore.
  468. Does Mapo Tofu Contain Coffee? Probing LLMs for Food-related Cultural Knowledge. ArXiv preprint, abs/2404.06833.
  469. VLUE: A Multi-Task Multi-Dimension Benchmark for Evaluating Vision-Language Pre-training. In International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine Learning Research, pages 27395–27411, PMLR.
  470. Quite good, but not enough: Nationality bias in large language models - a case study of ChatGPT. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 13489–13502, ELRA and ICCL, Torino, Italia.
  471. NormBank: A knowledge bank of situational social norms. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7756–7776, Association for Computational Linguistics, Toronto, Canada.
  472. Can Large Language Models Transform Computational Social Science? Computational Linguistics, 50(1):237–291.
  473. The Moral Integrity Corpus: A Benchmark for Ethical Dialogue Systems. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3755–3773, Association for Computational Linguistics, Dublin, Ireland.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Siddhesh Pawar (4 papers)
  2. Junyeong Park (21 papers)
  3. Jiho Jin (15 papers)
  4. Arnav Arora (24 papers)
  5. Junho Myung (14 papers)
  6. Srishti Yadav (10 papers)
  7. Faiz Ghifari Haznitrama (2 papers)
  8. Inhwa Song (3 papers)
  9. Alice Oh (82 papers)
  10. Isabelle Augenstein (131 papers)
Citations (1)

Summary

Survey of Cultural Awareness in LLMs: Text and Beyond

In the ongoing discourse concerning inclusivity and diversity in technology, cultural awareness in LLMs has become a focal point, especially as these models are deployed across diverse applications like chatbots and virtual assistants. The paper "Survey of Cultural Awareness in LLMs: Text and Beyond" by Pawar et al. provides an in-depth examination of this critical subject, scrutinizing how these models incorporate cultural nuances and sensitivities.

The paper embarks on its exploration by defining cultural awareness in LLMs, drawing insights from cultural understandings in psychology and anthropology. These disciplines offer a dual perspective: anthropology, focusing on the contextual understanding of human action, and psychology, emphasizing the sociocultural grounding of behavior. The authors suggest cultural awareness in LLMs should encapsulate the ability to understand diverse social contexts and variably interpret task elements across cultures, extending beyond spontaneous multilinguality.

In evaluating the methodologies utilized for integrating culture into LLMs, the paper distinguishes between data collection strategies and LLM adaptations. The methodologies leverage both automatic pipelines, such as large-scale web scraping from culturally marked sources, and manual data creation, which involves human annotators to ensure cultural precision. Noteworthy is the emergence of automatic and model-in-the-loop refinement techniques, which have increased the scale and specificity of cultural datasets. These include culture-specific corpora for languages like Korean, Arabic, and others, whose creation was previously hindered by resource constraints.

Cultural alignment in LLMs is achieved primarily through pre-training and fine-tuning approaches, encompassing both model training and prompting methods. Training methods include pre-training from scratch using culturally relevant data, while fine-tuning leverages instructions and specific datasets to align closely with cultural norms, practices, and shared values. On the other hand, prompting strategies, free from additional training, enable models to dynamically adjust to cultural cues in the input text, enhancing their adaptability across diverse cultural contexts.

The evaluation of these models is meticulously structured, utilizing benchmarks encompassing commonsense knowledge, social values, norms, biases, and emotional undertones across cultures. These benchmarks not only assess LLMs’ adaptability but also highlight discrepancies in model outputs across various cultural scenarios, necessitating more culturally aware data handling and bias reduction strategies.

The implications of this research paper are multifaceted. Practically, culturally aware LLMs hold the promise of reducing cultural biases and enhancing user interaction quality across global contexts. Theoretically, they pave the way for more equitable and inclusive AI systems that acknowledge and respect the rich tapestry of global cultural heritage. Future iterations of these models could potentially harness multilayered cultural data, allowing them to adapt dynamically to cultural shifts and more accurately align with the intricate web of human values, norms, and social expectations.

This survey positions itself at the unique intersection of NLP, multimodality (including vision and audio), and social sciences, emphasizing the intricate role human-computer interaction plays in evolving these systems. Given the rapid technological advancements and the expanding scope of AI applications, the paper argues for a paradigmatic shift toward culturally inclusive language technologies that do not merely incorporate multilinguality but also embed deep cultural competence. As researchers, this calls for a continued focus on dataset diversity, model adaptability, and ethical deployment strategies to fulfill the societal discourse of equitable technology development.