Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

For the Misgendered Chinese in Gender Bias Research: Multi-Task Learning with Knowledge Distillation for Pinyin Name-Gender Prediction (2405.06221v1)

Published 10 May 2024 in cs.CL and cs.CY

Abstract: Achieving gender equality is a pivotal factor in realizing the UN's Global Goals for Sustainable Development. Gender bias studies work towards this and rely on name-based gender inference tools to assign individual gender labels when gender information is unavailable. However, these tools often inaccurately predict gender for Chinese Pinyin names, leading to potential bias in such studies. With the growing participation of Chinese in international activities, this situation is becoming more severe. Specifically, current tools focus on pronunciation (Pinyin) information, neglecting the fact that the latent connections between Pinyin and Chinese characters (Hanzi) behind convey critical information. As a first effort, we formulate the Pinyin name-gender guessing problem and design a Multi-Task Learning Network assisted by Knowledge Distillation that enables the Pinyin embeddings in the model to possess semantic features of Chinese characters and to learn gender information from Chinese character names. Our open-sourced method surpasses commercial name-gender guessing tools by 9.70\% to 20.08\% relatively, and also outperforms the state-of-the-art algorithms.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. The sound of gender–correlations of name phonology and gender across languages. Linguistics, 59(4):1143–1177, 2021.
  2. Novel evidence for the increasing prevalence of unique names in china: A reply to ogihara. Frontiers in Psychology, 12:731244, 2021.
  3. Michael Crawshaw. Multi-task learning with deep neural networks: A survey. arXiv preprint arXiv:2009.09796, 2020.
  4. The extent and drivers of gender imbalance in neuroscience reference lists. Nature neuroscience, 23(8):918–926, 2020.
  5. Knowledge distillation: A survey. International Journal of Computer Vision, 129:1789–1819, 2021.
  6. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
  7. What’s in a name?–gender classification of names with character based machine learning models. Data Mining and Knowledge Discovery, 35(4):1537–1563, 2021.
  8. Historical comparison of gender inequality in scientific careers across countries and disciplines. Proceedings of the National Academy of Sciences, 117(9):4609–4616, 2020.
  9. Gender prediction based on chinese name. In Natural Language Processing and Chinese Computing: 8th CCF International Conference, NLPCC 2019, Dunhuang, China, October 9–14, 2019, Proceedings, Part II 8, pages 676–683. Springer, 2019.
  10. Bibliometrics: Global gender disparities in science. Nature, 504(7479):211–213, 2013.
  11. Name-based demographic inference and the unequal distribution of misrecognition. Nature Human Behaviour, pages 1–12, 2023.
  12. Gender inference: Can chatgpt outperform common commercial tools? In Proceedings of the 33rd Annual International Conference on Computer Science and Software Engineering, pages 161–166, 2023.
  13. For the underrepresented in gender bias research: Chinese name gender prediction with heterogeneous graph attention network. In Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence, AAAI’23/IAAI’23/EAAI’23. AAAI Press, 2023.
  14. How multilingual is multilingual bert? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4996–5001, 2019.
  15. Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550, 2014.
  16. Sebastian Ruder. An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098, 2017.
  17. Comparison and benchmark of name-to-gender inference services. PeerJ Computer Science, 4:e156, 2018.
  18. Paul Sebo. How accurate are gender detection tools in predicting the gender for chinese names? a study with 20,000 given names in pinyin format. Journal of the Medical Library Association: JMLA, 110(2):205, 2022.
  19. Gender and retention patterns among us faculty. Science Advances, 9(42):eadi2205, 2023.
  20. Peer review and gender bias: A study on 145 scholarly journals. Science advances, 7(2):eabd0299, 2021.
  21. An open-source cultural consensus approach to name-based gender classification. In Proceedings of the International AAAI Conference on Web and Social Media, volume 17, pages 866–877, 2023.
  22. Revisiting multi-task learning in the deep learning era. arXiv preprint arXiv:2004.13379, 2(3), 2020.
  23. Kamil Wais. Gender prediction methods based on first names with genderizer. R J., 8(1):17, 2016.
  24. Pad-net: Multi-tasks guided prediction-and-distillation network for simultaneous depth estimation and scene parsing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 675–684, 2018.
  25. Gender-diverse teams produce more novel and higher-impact scientific ideas. Proceedings of the National Academy of Sciences, 119(36):e2200841119, 2022.
  26. On the effectiveness of pinyin-character dual-decoding for end-to-end mandarin chinese asr. arXiv preprint arXiv:2201.10792, 2022.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com