Should We Respect LLMs? A Cross-Lingual Study on the Influence of Prompt Politeness on LLM Performance (2402.14531v1)
Abstract: We investigate the impact of politeness levels in prompts on the performance of LLMs. Polite language in human communications often garners more compliance and effectiveness, while rudeness can cause aversion, impacting response quality. We consider that LLMs mirror human communication traits, suggesting they align with human cultural norms. We assess the impact of politeness in prompts on LLMs across English, Chinese, and Japanese tasks. We observed that impolite prompts often result in poor performance, but overly polite language does not guarantee better outcomes. The best politeness level is different according to the language. This phenomenon suggests that LLMs not only reflect human behavior but are also influenced by language, particularly in different cultural contexts. Our findings highlight the need to factor in politeness for cross-cultural natural language processing and LLM usage.
- Cultural Affairs. 2007. 敬語の指針. 平成 19 年, 2.
- Authentic self-expression on social media is associated with greater subjective well-being. Nature Communications, 11(1):4889.
- Assessing cross-cultural alignment between ChatGPT and human societies: An empirical study. In Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP), pages 53–67, Dubrovnik, Croatia. Association for Computational Linguistics.
- Deep reinforcement learning from human preferences. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.
- Measuring fairness with biased rulers: A comparative study on bias metrics for pre-trained language models. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1693–1706, Seattle, United States. Association for Computational Linguistics.
- Bold: Dataset and metrics for measuring biases in open-ended language generation. In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pages 862–872.
- Robin S Dillon. 2003. Respect.
- Glm: General language model pretraining with autoregressive blank infilling. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 320–335.
- Gender Equality Bureau Cabinet Office of Japan. 2021. 共同参画. Accessed: 2023-12-19.
- Yueguo Gu. 1990. Politeness phenomena in modern chinese. Journal of Pragmatics, 14(2):237–257. Special Issue on ‘Politeness’.
- XL-sum: Large-scale multilingual abstractive summarization for 44 languages. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 4693–4703, Online. Association for Computational Linguistics.
- Measuring massive multitask language understanding.
- Teaching machines to read and comprehend. In NIPS.
- C-eval: A multi-level multi-discipline chinese evaluation suite for foundation models. In Advances in Neural Information Processing Systems.
- Sophie Jentzsch and Cigdem Turan. 2022. Gender bias in BERT - measuring and analysing biases through sentiment rating in a realistic downstream classification task. In Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP), pages 184–199, Seattle, Washington. Association for Computational Linguistics.
- Challenges and applications of large language models.
- Gender bias in masked language models for multiple languages. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2740–2750, Seattle, United States. Association for Computational Linguistics.
- Kenji Kitao. 1987. Differences between politeness strategies used in requests by americans and japanese.
- Kenji Kitao. 1990. A study of japanese and american perceptions of politeness in requests.
- JGLUE: Japanese general language understanding evaluation. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 2957–2966, Marseille, France. European Language Resources Association.
- Holistic evaluation of language models.
- Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
- Yoshiko Matsumura. 2001. 日本語の会話に見られる男女差.
- Sara Mills and Dániel Z Kádár. 2011. Politeness and culture. Politeness in East Asia, pages 21–44.
- Yutaka Miyaji. 1971. 現代の敬語. 講座国語史第 5 巻敬語史」 大修館書店.
- David A. Morand. 1996. Politeness as a universal variable in cross-cultural managerial communication. The International Journal of Organizational Analysis, 4(1):52–74.
- CrowS-pairs: A challenge dataset for measuring social biases in masked language models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1953–1967, Online. Association for Computational Linguistics.
- Biases in large language models: Origins, inventory, and discussion. J. Data and Information Quality, 15(2).
- OpenAI. 2023. Gpt-4. https://openai.com/research/gpt-4. Accessed: 2023-12-19.
- Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems, volume 35, pages 27730–27744. Curran Associates, Inc.
- Get to the point: Summarization with pointer-generator networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1073–1083, Vancouver, Canada. Association for Computational Linguistics.
- AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4222–4235, Online. Association for Computational Linguistics.
- Step. 2023. ステップ学習塾|神奈川県の塾・学習塾・進学塾・個別指導. Accessed: 2024-1-5.
- Masato Takiura. 2017. 日本語敬語および関連現象の社会語用論的研究 [全文の要約]. theses (doctoral - abstract of entire text), 北海道大学.
- Llama 2: Open foundation and fine-tuned chat models.
- Liisa Vilkki. 2006. Politeness, face and facework: Current issues. A man of measure.
- VIST. 2023. New style cram school vist. https://www.v-ist.com. Accessed: 2024-1-5.
- GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 353–355, Brussels, Belgium. Association for Computational Linguistics.
- A prompt pattern catalog to enhance prompt engineering with chatgpt.
- CLUE: A Chinese language understanding evaluation benchmark. In Proceedings of the 28th International Conference on Computational Linguistics, pages 4762–4772, Barcelona, Spain (Online). International Committee on Computational Linguistics.
- Chunsheng Xun. 1999. 汉语的敬语及其文化心理背景. 九州大学言語文化部言語文化論究, 10:1–9.
- Glm-130b: An open bilingual pre-trained model. arXiv preprint arXiv:2210.02414.
- Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675.
- CHBias: Bias evaluation and mitigation of Chinese conversational language models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13538–13556, Toronto, Canada. Association for Computational Linguistics.
- Xiaojuan Zhou. 2008. 现代汉语礼貌语言研究.