Measuring and Modeling Culture in LLMs: A Survey Overview
The paper "Towards Measuring and Modeling 'Culture' in LLMs: A Survey" provides a comprehensive examination of the intersection between culture and LLMs, focusing on the evaluation of cultural representation, inclusion, and bias. It scrutinizes 39 papers dedicated to this purpose, highlighting the existing methodology, results, and gaps in the current body of literature. The survey underscores the complexity of defining "culture," noting that none of the reviewed papers provide a conclusive definition, instead relying on various cultural proxies within their datasets.
Cultural Proxies and Dimensions
The paper organizes the paper of culture across three main dimensions: demographic proxies, semantic proxies, and language-culture interaction axes.
- Demographic Proxies: This dimension includes aspects such as region, language, gender, race, religion, and ethnicity. Region and language often serve as prevalent proxies for culture, but the paper notes that cultural studies involving other dimensions like gender and ethnicity are influenced significantly by Western-centric diversity narratives.
- Semantic Proxies: While the majority of studies focus on semantic proxies like emotions and values, the survey identifies a lack of research across the full spectrum of semantic domains, such as kinship terms or physical world concepts.
- Language-Culture Interaction: Based on Hershcovich et al. (2022) framework, this dimension categorizes interactions into aboutness, common ground, and objectives/values. The authors found many papers concentrate on objectives and values, while aboutness remains largely unexamined.
Methodologies for Probing Culture in LLMs
The survey categorizes the methodologies used to assess culture within LLMs into black-box and white-box approaches. The predominant method involves black-box probing, where LLMs are queried with culture-specific prompts and their responses analyzed. These techniques are sub-categorized into discriminative probing, where models select from given options, and generative probing, which involves free-text generation by the models. The authors critique the robustness of current probing methods, highlighting issues such as sensitivity to prompts and limited interpretability.
Identified Gaps and Recommendations
The paper identifies three critical gaps: one, limited exploration and coverage of cultural facets, mainly focusing on values and norms; two, limited robustness and reliability in probing methods; and three, absence of contextual and situated studies evaluating practical LLM applications. In addressing these gaps, the authors offer several recommendations:
- Definitional Clarifications: Future research should clearly specify the cultural proxies and situate studies within a broader cultural context.
- Diverse Cultural Domains: There is a need for wider exploration across various semantic domains and linguistic-cultural interactions.
- Interdisciplinary Collaboration: Bridging with anthropology, HCI, and ICTD could offer deeper insights and understanding of cultural nuances.
- Increased Focus on Multilingual Datasets: More culturally nuanced and non-translatable datasets should be developed to better reflect and paper cultural interactions in LLMs.
Conclusion
This survey provides a critical assessment of the current status of culture in LLMs by offering a foundational taxonomy and identifying methodological and conceptual weaknesses in existing research. The paper makes crucial strides in understanding how LLMs interact with multifaceted cultural aspects and offers a blueprint for future research endeavors aimed at achieving better cultural representation and inclusion in AI systems.