Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Does Mapo Tofu Contain Coffee? Probing LLMs for Food-related Cultural Knowledge (2404.06833v2)

Published 10 Apr 2024 in cs.CL
Does Mapo Tofu Contain Coffee? Probing LLMs for Food-related Cultural Knowledge

Abstract: Recent studies have highlighted the presence of cultural biases in LLMs, yet often lack a robust methodology to dissect these phenomena comprehensively. Our work aims to bridge this gap by delving into the Food domain, a universally relevant yet culturally diverse aspect of human life. We introduce FmLAMA, a multilingual dataset centered on food-related cultural facts and variations in food practices. We analyze LLMs across various architectures and configurations, evaluating their performance in both monolingual and multilingual settings. By leveraging templates in six different languages, we investigate how LLMs interact with language-specific and cultural knowledge. Our findings reveal that (1) LLMs demonstrate a pronounced bias towards food knowledge prevalent in the United States; (2) Incorporating relevant cultural context significantly improves LLMs' ability to access cultural knowledge; (3) The efficacy of LLMs in capturing cultural nuances is highly dependent on the interplay between the probing language, the specific model architecture, and the cultural context in question. This research underscores the complexity of integrating cultural understanding into LLMs and emphasizes the importance of culturally diverse datasets to mitigate biases and enhance model performance across different cultural domains.

Probing LLMs for Food-Related Cultural Knowledge

The paper, "Does Mapo Tofu Contain Coffee? Probing LLMs for Food-related Cultural Knowledge," presents an in-depth examination of the cultural knowledge embedded within LLMs, focusing on food as a culturally significant domain. The authors introduce a dataset named FmLAMA, designed to assess LLMs' cultural knowledge retrieval concerning food across different languages and cultural traditions.

The primary findings of this paper suggest that current LLMs, prominently trained on diverse datasets, exhibit a notable bias towards food knowledge prevalent in the United States. The research emphasizes that the inclusion of the relevant cultural context in query prompts significantly enhances the models' ability to retrieve precise cultural knowledge. This capability is shown to be highly sensitive to the interplay between the language of probing, the architecture of the model, and the cultural context.

The methodology involves the automated construction of FmLAMA, a dataset that captures the intricacies of cultural knowledge in the domain of food. The authors employ Wikidata to gather extensive food-related data, leveraging language-specific attributes to categorize this information into culturally related content. This approach highlights current LLMs' deficiencies in managing culturally implicit knowledge and underscores the importance of diverse datasets for accurate knowledge retrieval.

In probing different architectures, such as encoder-only, encoder-decoder, and decoder-only models, the paper reveals variations in their ability to access cultural knowledge. Monolingual English models performed better in English-speaking contexts, while multilingual models did not show definitive superiority in non-English contexts, indicating a potential discrepancy in the pretraining data distribution concerning language and culture.

A significant contribution is the introduction of novel metrics for evaluating LLMs' cultural knowledge probing capabilities: Mean Average Precision (mAP) and Mean Word Similarity (mWS). These metrics allow for a more nuanced understanding of LLMs' performance beyond mere syntactic and semantic matching, considering the flexibility required in capturing cultural nuances.

Moreover, the case paper on ingredient analysis within the paper offers a practical illustration of LLMs making repetitive, context-independent predictions. This hints at an inherent limitation within current LLMs, relying on generalized assumptions rather than context-aware cultural understanding, particularly concerning diverse culinary practices.

The implications of this research are twofold: practically, it stresses the need for incorporating enhanced datasets that account for cultural diversity to improve model performance in real-world applications. Theoretically, it provides insights into understanding how LLMs encode cultural knowledge, guiding future enhancements in LLM pretraining and fine-tuning strategies to reduce systematic biases.

Looking forward, future developments in AI, especially in natural language processing, could pivot around integrating more culturally diverse databases. This may assist in rendering LLMs not just repositories of information but nuanced entities informed by culturally relative knowledge frameworks, fostering better cross-cultural understanding and communication.

This paper is a seminal addition to the existing literature on the cultural probing of machine learning models, providing valuable insights and benchmarks that can inform future research in addressing cultural biases and improving the robustness of LLMs in capturing global diversity in knowledge representation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. Gustavo Aguilar and Thamar Solorio. 2020. From English to code-switching: Transfer learning with strong morphological clues. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8033–8044, Online. Association for Computational Linguistics.
  2. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5:135–146.
  3. Cultural Adaptation of Recipes. Transactions of the Association for Computational Linguistics, 12:80–99.
  4. Assessing cross-cultural alignment between ChatGPT and human societies: An empirical study. In Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP), pages 53–67, Dubrovnik, Croatia. Association for Computational Linguistics.
  5. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality. See https://vicuna. lmsys. org (accessed 14 April 2023).
  6. Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8440–8451, Online. Association for Computational Linguistics.
  7. StereoKG: Data-driven knowledge graph construction for cultural knowledge and stereotypes. In Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH), pages 67–78, Seattle, Washington (Hybrid). Association for Computational Linguistics.
  8. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
  9. A survey of code-switching: Linguistic and social perspectives for language technologies. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1654–1666, Online. Association for Computational Linguistics.
  10. Measuring and improving consistency in pretrained language models. Transactions of the Association for Computational Linguistics, 9:1012–1031.
  11. Constanza Fierro and Anders Søgaard. 2022. Factual consistency of multilingual pretrained language models. In Findings of the Association for Computational Linguistics: ACL 2022, pages 3046–3052, Dublin, Ireland. Association for Computational Linguistics.
  12. On the interaction of belief bias and explanations. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 2930–2942, Online. Association for Computational Linguistics.
  13. X-FACTR: Multilingual factual knowledge retrieval from pretrained language models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5943–5959, Online. Association for Computational Linguistics.
  14. How Can We Know What Language Models Know? Transactions of the Association for Computational Linguistics, 8:423–438.
  15. Fasttext. zip: Compressing text classification models. arXiv preprint arXiv:1612.03651.
  16. Gender bias in masked language models for multiple languages. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2740–2750, Seattle, United States. Association for Computational Linguistics.
  17. Multilingual LAMA: Investigating knowledge in multilingual pretrained language models. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 3250–3258, Online. Association for Computational Linguistics.
  18. Amr Keleg and Walid Magdy. 2023. DLAMA: A framework for curating culturally diverse facts for probing the knowledge of pretrained language models. In Findings of the Association for Computational Linguistics: ACL 2023, pages 6245–6266, Toronto, Canada. Association for Computational Linguistics.
  19. Yuri Kuratov and Mikhail Arkhipov. 2019. Adaptation of deep bidirectional multilingual transformers for russian language. arXiv preprint arXiv:1905.07213.
  20. Heather Lent and Anders Søgaard. 2021. Common sense bias in semantic role labeling. In Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021), pages 114–119, Online. Association for Computational Linguistics.
  21. Rewire-then-probe: A contrastive recipe for probing biomedical knowledge of pre-trained language models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4798–4810, Dublin, Ireland. Association for Computational Linguistics.
  22. Michael Minkov and Geert Hofstede. 2012. Is national culture a meaningful concept? cultural values delineate homogeneous national clusters of in-country regions. Cross-Cultural Research, 46(2):133–159.
  23. Extracting cultural commonsense knowledge at scale. In Proceedings of the ACM Web Conference 2023, pages 1907–1917.
  24. OpenAI. 2023. ChatGPT. https://chat.openai.com/.
  25. Shramay Palta and Rachel Rudinger. 2023. FORK: A bite-sized test set for probing culinary cultural biases in commonsense reasoning models. In Findings of the Association for Computational Linguistics: ACL 2023, pages 9952–9962, Toronto, Canada. Association for Computational Linguistics.
  26. Klue: Korean language understanding evaluation.
  27. Traversing cultural boundaries in ib: The complex relationships between explicit country and implicit cultural group boundaries at multiple levels. Journal of International Business Studies, 49:1081–1099.
  28. Language models as knowledge bases? In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2463–2473, Hong Kong, China. Association for Computational Linguistics.
  29. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551.
  30. KUISAIL at SemEval-2020 task 12: BERT-CNN for offensive speech identification in social media. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, pages 2054–2059, Barcelona (online). International Committee for Computational Linguistics.
  31. Gender bias in machine translation. Transactions of the Association for Computational Linguistics, 9:845–874.
  32. AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4222–4235, Online. Association for Computational Linguistics.
  33. Anders Søgaard. 2021. Locke’s holiday: Belief bias in machine reading. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 8240–8245, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  34. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  35. Assessing the reliability of large language model knowledge. arXiv preprint arXiv:2310.09820.
  36. Do PLMs know and understand ontological knowledge? In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3080–3101, Toronto, Canada. Association for Computational Linguistics.
  37. mt5: A massively multilingual pre-trained text-to-text transformer. arXiv preprint arXiv:2010.11934.
  38. GeoMLAMA: Geo-diverse commonsense probing on multilingual pre-trained language models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 2039–2055, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  39. Factual probing is [MASK]: Learning vs. learning to recall. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5017–5033, Online. Association for Computational Linguistics.
  40. Cultural compass: Predicting transfer learning success in offensive language detection with cultural features. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 12684–12702, Singapore. Association for Computational Linguistics.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Li Zhou (215 papers)
  2. Taelin Karidi (12 papers)
  3. Nicolas Garneau (10 papers)
  4. Yong Cao (33 papers)
  5. Wanlong Liu (13 papers)
  6. Wenyu Chen (48 papers)
  7. Daniel Hershcovich (50 papers)
  8. Haizhou Li (285 papers)
Citations (7)
X Twitter Logo Streamline Icon: https://streamlinehq.com