Bridging Cultural Nuances in Dialogue Agents through Cultural Value Surveys (2401.10352v2)
Abstract: The cultural landscape of interactions with dialogue agents is a compelling yet relatively unexplored territory. It's clear that various sociocultural aspects -- from communication styles and beliefs to shared metaphors and knowledge -- profoundly impact these interactions. To delve deeper into this dynamic, we introduce cuDialog, a first-of-its-kind benchmark for dialogue generation with a cultural lens. We also develop baseline models capable of extracting cultural attributes from dialogue exchanges, with the goal of enhancing the predictive accuracy and quality of dialogue agents. To effectively co-learn cultural understanding and multi-turn dialogue predictions, we propose to incorporate cultural dimensions with dialogue encoding features. Our experimental findings highlight that incorporating cultural value surveys boosts alignment with references and cultural markers, demonstrating its considerable influence on personalization and dialogue quality. To facilitate further exploration in this exciting domain, we publish our benchmark publicly accessible at https://github.com/yongcaoplus/cuDialog.
- Homophily and latent attribute inference: Inferring latent attributes of twitter users from neighbors. In Proceedings of the International AAAI Conference on Web and Social Media, volume 6, pages 387–390.
- Badar Almuhailib. 2019. Analyzing cross-cultural writing differences using contrastive rhetoric: A critical review. Advances in Language and Literary Studies, 10(2):102–106.
- Probing pre-trained language models for cross-cultural differences in values. In Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP), pages 114–130, Dubrovnik, Croatia. Association for Computational Linguistics.
- What is gun culture? cultural variations and trends across the united states. Humanities and Social Sciences Communications, 7(1):1–12.
- Assessing cross-cultural alignment between ChatGPT and human societies: An empirical study. In Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP), pages 53–67, Dubrovnik, Croatia. Association for Computational Linguistics.
- Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8440–8451, Online. Association for Computational Linguistics.
- No language left behind: Scaling human-centered machine translation. arXiv preprint arXiv:2207.04672.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
- Xiao Dong Yue. 2010. Exploration of chinese humor: Historical review, empirical findings, and critical reflections.
- The hitchhiker’s guide to testing statistical significance in natural language processing. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1383–1392, Melbourne, Australia. Association for Computational Linguistics.
- The hitchhiker’s guide to testing statistical significance in natural language processing. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1383–1392. Association for Computational Linguistics.
- Does moral code have a moral code? probing delphi’s moral philosophy. In Proceedings of the 2nd Workshop on Trustworthy Natural Language Processing (TrustNLP 2022), pages 26–42, Seattle, U.S.A. Association for Computational Linguistics.
- There are a thousand hamlets in a thousand people’s eyes: Enhancing knowledge-grounded dialogue with personal memory. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3901–3913, Dublin, Ireland. Association for Computational Linguistics.
- Iason Gabriel. 2020. Artificial intelligence, values, and alignment. Minds and machines, 30(3):411–437.
- Detecting speaker personas from conversational texts. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 1126–1136, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- A corpus for understanding and generating moral stories. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5069–5087, Seattle, United States. Association for Computational Linguistics.
- Speaking multiple languages affects the moral bias of language models. arXiv preprint arXiv:2211.07733.
- Challenges and strategies in cross-cultural NLP. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6997–7013, Dublin, Ireland. Association for Computational Linguistics.
- Geert Hofstede. 1984. Culture’s consequences: International differences in work-related values, volume 5. sage.
- Cultural differences in humor perception, usage, and implications. Frontiers in psychology, 10:123.
- The ghost in the machine has an american accent: value conflict in gpt-3. arXiv preprint arXiv:2203.07785.
- Multi-lingual and multi-cultural figurative language understanding. arXiv preprint arXiv:2305.16171.
- Controversy and conformity: from generalized to personalized aggressiveness detection. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 5915–5926, Online. Association for Computational Linguistics.
- Amr Keleg and Walid Magdy. 2023. Dlama: A framework for curating culturally diverse facts for probing the knowledge of pretrained language models. arXiv preprint arXiv:2306.05076.
- m^4 adapter: Multilingual multi-domain adaptation for machine translation with a meta-adapter. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 4282–4296, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880, Online. Association for Computational Linguistics.
- OpenSubtitles2018: Statistical rescoring of sentence alignments in large, noisy parallel corpora. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. European Language Resources Association (ELRA).
- Visually grounded reasoning across languages and cultures. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 10467–10485, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
- Complexifying individualism versus collectivism and west versus east: Exploring global diversity in perspectives on self and other in the gallup world poll. Journal of Cross-Cultural Psychology, 54(1):61–89.
- EnCBP: A new benchmark dataset for finer-grained cultural background prediction in English. In Findings of the Association for Computational Linguistics: ACL 2022, pages 2811–2823, Dublin, Ireland. Association for Computational Linguistics.
- Unsupervised enrichment of persona-grounded dialog with background stories. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 585–592, Online. Association for Computational Linguistics.
- Rod A Martin and Thomas Ford. 2018. The psychology of humor: An integrative approach. Academic press.
- Normmark: A weakly supervised markov model for socio-cultural norm discovery. arXiv preprint arXiv:2305.16598.
- Culture and systems of thought: holistic versus analytic cognition. Psychological review, 108(2):291.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
- Marco Pennacchiotti and Ana-Maria Popescu. 2011. Democrats, republicans and starbucks afficionados: user classification in twitter. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 430–438.
- Pchatbot: A large-scale dataset for personalized chatbot. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2470–2477.
- Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551.
- Khairiah A Rahman. 2013. Life imitating art: Asian romance movies as a social mirror. Pacific Journalism Review, 19(2):107–121.
- Whose opinions do language models reflect?
- Vered Shwartz. 2022. Good night at 4 pm?! time expressions in different cultures. In Findings of the Association for Computational Linguistics: ACL 2022, pages 2842–2853, Dublin, Ireland. Association for Computational Linguistics.
- Cross-cultural similarity features for cross-lingual transfer learning of pragmatically motivated tasks. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 2403–2414, Online. Association for Computational Linguistics.
- Nir Sweed and Dafna Shahaf. 2021. Catchphrase: Automatic detection of cultural references. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 1–7, Online. Association for Computational Linguistics.
- Multilingual translation with extensible multilingual pretraining and finetuning. arXiv preprint arXiv:2008.00401.
- Overcoming catastrophic forgetting during domain adaptation of neural machine translation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2062–2068, Minneapolis, Minnesota. Association for Computational Linguistics.
- A reproduction of apple’s bi-directional LSTM models for language identification in short strings. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop, pages 36–42, Online. Association for Computational Linguistics.
- Capturing cultural differences in expressions of intentions. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pages 48–57, Dublin, Ireland. Dublin City University and Association for Computational Linguistics.
- Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-sne. Journal of machine learning research, 9(11).
- Commonsense and named entity aware knowledge grounded dialogue generation. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1322–1335, Seattle, United States. Association for Computational Linguistics.
- Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
- Personalized response generation via generative split memory network. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1956–1970, Online. Association for Computational Linguistics.
- mT5: A massively multilingual pre-trained text-to-text transformer. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 483–498, Online. Association for Computational Linguistics.
- Mdia: A benchmark for multilingual dialogue generation in 46 languages. arXiv preprint arXiv:2208.13078.
- Ziqiang Zhang and Junyi Ao. 2022. The YiTrans speech translation system for IWSLT 2022 offline shared task. In Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022), pages 158–168, Dublin, Ireland (in-person and online). Association for Computational Linguistics.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.