Politeness Stereotypes and Attack Vectors: Gender Stereotypes in Japanese and Korean Language Models (2306.09752v1)
Abstract: In efforts to keep up with the rapid progress and use of LLMs, gender bias research is becoming more prevalent in NLP. Non-English bias research, however, is still in its infancy with most work focusing on English. In our work, we study how grammatical gender bias relating to politeness levels manifests in Japanese and Korean LLMs. Linguistic studies in these languages have identified a connection between gender bias and politeness levels, however it is not yet known if LLMs reproduce these biases. We analyze relative prediction probabilities of the male and female grammatical genders using templates and find that informal polite speech is most indicative of the female grammatical gender, while rude and formal speech is most indicative of the male grammatical gender. Further, we find politeness levels to be an attack vector for allocational gender bias in cyberbullying detection models. Cyberbullies can evade detection through simple techniques abusing politeness levels. We introduce an attack dataset to (i) identify representational gender bias across politeness levels, (ii) demonstrate how gender biases can be abused to bypass cyberbullying detection models and (iii) show that allocational biases can be mitigated via training on our proposed dataset. Through our findings we highlight the importance of bias research moving beyond its current English-centrism.
- Malika Aubakirova and Mohit Bansal. 2016. Interpreting neural networks to improve politeness comprehension. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2035–2041, Austin, Texas. Association for Computational Linguistics.
- Unmasking contextual stereotypes: Measuring and mitigating BERT’s gender bias. In Proceedings of the Second Workshop on Gender Bias in Natural Language Processing, pages 1–16, Barcelona, Spain (Online). Association for Computational Linguistics.
- Systematic inequalities in language technology performance across the world’s languages. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5486–5505, Dublin, Ireland. Association for Computational Linguistics.
- Language (technology) is power: A critical survey of “bias” in NLP. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5454–5476, Online. Association for Computational Linguistics.
- Language models are few-shot learners.
- Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334):183–186.
- Mapping the multilingual margins: Intersectional biases of sentiment analysis systems in English, Spanish, and Arabic. In Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion, pages 90–106, Dublin, Ireland. Association for Computational Linguistics.
- Adrian Colin Cameron and P. K. Trivedi. 2005. Microeconometrics: methods and applications. Cambridge University Press.
- Central Intelligence Agency. 2022. The world factbook. Central Intelligence Agency.
- On measuring gender bias in translation of gender-neutral pronouns. In Proceedings of the First Workshop on Gender Bias in Natural Language Processing, pages 173–181, Florence, Italy. Association for Computational Linguistics.
- Unsupervised cross-lingual representation learning at scale. CoRR, abs/1911.02116.
- A computational approach to politeness with application to social factors. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 250–259, Sofia, Bulgaria. Association for Computational Linguistics.
- Automated hate speech detection and the problem of offensive language. In International AAAI Conference on Web and Social Media.
- Bias in bios: A case study of semantic representation bias in a high-stakes setting. In Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT* ’19, page 120–128, New York, NY, USA. Association for Computing Machinery.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
- Genki. 1, Textbook: Textbook I, 2. ed edition. The Japan Times.
- Large scale crowdsourcing and characterization of twitter abusive behavior. In 11th International Conference on Web and Social Media, ICWSM 2018. AAAI Press.
- Polyglot prompt: Multilingual multitask promptraining.
- Making pre-trained language models better few-shot learners. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 3816–3830, Online. Association for Computational Linguistics.
- Gender Equality Bureau, Cabinet Office. 2022. Current status and challenges of gender equality in japan. Gender Equality Bureau, Cabinet Office.
- All you need is "love": Evading hate speech detection. In Proceedings of the 11th ACM Workshop on Artificial Intelligence and Security, pages 2–12. ACM.
- Susan Herring. 1994. Politeness in computer culture: Why women thank and men flame. Cultural Performances: Proceedings of the Third Berkeley Women and Language Conference.
- Hiroko Yamagishi. 2014. 敬語サクッとノート: すぐに使えて, きちんと伝わる = Keigo sakutto note. 永岡書店. OCLC: 864338157.
- Sachiko Ide and Megumi Yoshida. 2017. Sociolinguistics: Honorifics and gender differences. In Natsuko Tsujimura, editor, The Handbook of Japanese Linguistics, pages 444–480. Blackwell Publishing Ltd.
- Maieutic prompting: Logically consistent reasoning with recursive explanations.
- Gender bias in masked language models for multiple languages. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2740–2750, Seattle, United States. Association for Computational Linguistics.
- Eunhee Kim and Hyopil Shin. 2022. Kr-finbert: Kr-bert-medium adapted with financial domain data. https://huggingface.co/snunlp/KR-FinBert.
- Kiyoung Kim. 2020. Pretrained language models for korean. Github.
- Korean Women’s Development Institute. 2022. The 2021 korean women manager panel survey. Korean Women’s Development Institute.
- Korean Women’s Development Institute IS. 2022. Digital transformation-driven changes in women’s jobs in manufacturing SMEs & policy tasks. Korean Women’s Development Institute IS.
- Teven Le Scao and Alexander Rush. 2021. How many data points is a prompt worth? In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2627–2636, Online. Association for Computational Linguistics.
- Junbum Lee. 2020. Kcbert: Korean comments bert. In Proceedings of the 32nd Annual Conference on Human and Cognitive Language Technology, pages 437–440.
- Monolingual and multilingual reduction of gender bias in contextualized representations. In Proceedings of the 28th International Conference on Computational Linguistics, pages 5082–5093, Barcelona, Spain (Online). International Committee on Computational Linguistics.
- Cutting down on prompts and parameters: Simple few-shot learning with language models. In Findings of the Association for Computational Linguistics: ACL 2022, pages 2824–2835, Dublin, Ireland. Association for Computational Linguistics.
- Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8086–8098, Dublin, Ireland. Association for Computational Linguistics.
- In data we trust: A critical analysis of hate speech detection datasets. In Proceedings of the Fourth Workshop on Online Abuse and Harms, pages 150–161, Online. Association for Computational Linguistics.
- Listening to affected communities to define extreme speech: Dataset and experiments. In Findings of the Association for Computational Linguistics: ACL 2022, pages 1089–1104, Dublin, Ireland. Association for Computational Linguistics.
- Tackling online abuse: A survey of automated abuse detection methods. arXiv. Publisher: arXiv Version Number: 2.
- Reframing instructional prompts to GPTk’s language. In Findings of the Association for Computational Linguistics: ACL 2022, pages 589–612, Dublin, Ireland. Association for Computational Linguistics.
- StereoSet: Measuring stereotypical bias in pretrained language models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 5356–5371, Online. Association for Computational Linguistics.
- CrowS-pairs: A challenge dataset for measuring social biases in masked language models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1953–1967, Online. Association for Computational Linguistics.
- Fair is better than sensational: Man is to doctor as woman is to doctor. Computational Linguistics, 46(2):487–497.
- Debora Nozza. 2021. Exposing the limits of zero-shot cross-lingual hate speech detection. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 907–914, Online. Association for Computational Linguistics.
- Shigeko Okamoto. 2013. Variability in societal norms for japanese women’s speech: Implications for linguistic politeness. Multilingua - Journal of Cross-Cultural and Interlanguage Communication, 32(2).
- Multilingual and multi-aspect hate speech analysis. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4675–4684, Hong Kong, China. Association for Computational Linguistics.
- Jangwon Park. 2020. Koelectra: Pretrained electra model for korean. GitHub repository.
- Klue: Korean language understanding evaluation. arXiv.
- Assessing gender bias in machine translation – a case study with google translate. arXiv. Publisher: arXiv Version Number: 4.
- Yacis: A five-billion-word corpus of japanese blogs fully annotated with syntactic and affective information. In Proceedings of the AISB/IACAP world congress, pages 40–49.
- Michal E Ptaszynski and Fumito Masui. 2018. Automatic Cyberbullying Detection: Emerging Research and Opportunities: Emerging Research and Opportunities. IGI Global.
- Tharindu Ranasinghe and Marcos Zampieri. 2020. Multilingual offensive language identification with cross-lingual embeddings. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5838–5844, Online. Association for Computational Linguistics.
- Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks.
- Jaemin Roh. 2013. Korean, complete edition edition. Living Language. OCLC: 857791525.
- Measuring the reliability of hate speech annotations: The case of the european refugee crisis. CoRR, abs/1701.08118.
- Multilingual HateCheck: Functional tests for multilingual hate speech detection models. In Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH), pages 154–169, Seattle, Washington (Hybrid). Association for Computational Linguistics.
- HateCheck: Functional tests for hate speech detection models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 41–58, Online. Association for Computational Linguistics.
- Multitask prompted training enables zero-shot task generalization. In International Conference on Learning Representations.
- Automatically identifying words that can serve as labels for few-shot text classification. In Proceedings of the 28th International Conference on Computational Linguistics, pages 5569–5578, Barcelona, Spain (Online). International Committee on Computational Linguistics.
- Timo Schick and Hinrich Schütze. 2021a. Exploiting cloze-questions for few-shot text classification and natural language inference. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 255–269, Online. Association for Computational Linguistics.
- Timo Schick and Hinrich Schütze. 2021b. It’s not just size that matters: Small language models are also few-shot learners. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2339–2352, Online. Association for Computational Linguistics.
- Development and performance evaluation of electra pretrained language model based on yacis large-scale japanese blog corpus [in japanese]. In Proceedings of The 28th Annual Meeting of The Association for Natural Language Processing (NLP2022), pages 1–4.
- AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4222–4235, Online. Association for Computational Linguistics.
- Statistical methods, 8th ed., 7. print edition. Iowa State Univ. Press.
- Anirudh Srinivasan and Eunsol Choi. 2022. TyDiP: A dataset for politeness classification in nine typologically diverse languages. arXiv, (arXiv:2211.16496).
- Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv. Publisher: arXiv Version Number: 2.
- An information-theoretic approach and dataset for probing gender stereotypes in multilingual masked language models. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 921–932, Seattle, United States. Association for Computational Linguistics.
- Bak Sung-Yun. 1983. Women’s speech in korean and english. Korean Studies, 7(1):61–75.
- Constructing and analyzing domain-specific language model for financial text mining. Information Processing & Management, 60(2):103194.
- Studying generalisability across abusive language detection datasets. In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), pages 940–950, Hong Kong, China. Association for Computational Linguistics.
- Efficient few-shot learning without prompts.
- Challenges and frontiers in abusive content detection. In Proceedings of the Third Workshop on Abusive Language Online, pages 80–93, Florence, Italy. Association for Computational Linguistics.
- SciPy 1.0: fundamental algorithms for scientific computing in python. Nature Methods, 17(3):261–272.
- Zeerak Waseem and Dirk Hovy. 2016. Hateful symbols or hateful people? predictive features for hate speech detection on Twitter. In Proceedings of the NAACL Student Research Workshop, pages 88–93, San Diego, California. Association for Computational Linguistics.
- Emergent abilities of large language models. Transactions on Machine Learning Research. Survey Certification.
- Detection of Abusive Language: the Problem of Biased Datasets. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 602–608, Minneapolis, Minnesota. Association for Computational Linguistics.
- Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
- World Economic Forum. 2021. Global gender gap report 2021. World Economic Forum.
- Ai chains: Transparent and controllable human-ai interaction by chaining large language model prompts. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, CHI ’22, New York, NY, USA. Association for Computing Machinery.
- Separating hate speech and offensive language classes via adversarial debiasing. In Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH), pages 1–10, Seattle, Washington (Hybrid). Association for Computational Linguistics.
- Twhin-bert: A socially-enriched pre-trained language model for multilingual tweet representations. arXiv preprint arXiv:2209.07562.
- Mengjie Zhao and Hinrich Schütze. 2021. Discrete and soft prompting for multilingual models. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 8547–8555, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- 趙 天雨 and 沢田 慶. 2021. 日本語自然言語処理における事前学習モデルの公開. 人工知能学会研究会資料 言語・音声理解と対話処理研究会, 93:169–170.
- 学校非公式サイトにおける有害情報検出. 電子情報通信学会技術研究報告. NLC, 言語理解とコミュニケーション, 109(142):93–98.
- Victor Steinborn (1 paper)
- Antonis Maronikolakis (11 papers)
- Hinrich Schütze (250 papers)