Effect of dimensionality change on the bias of word embeddings (2312.17292v1)
Abstract: Word embedding methods (WEMs) are extensively used for representing text data. The dimensionality of these embeddings varies across various tasks and implementations. The effect of dimensionality change on the accuracy of the downstream task is a well-explored question. However, how the dimensionality change affects the bias of word embeddings needs to be investigated. Using the English Wikipedia corpus, we study this effect for two static (Word2Vec and fastText) and two context-sensitive (ElMo and BERT) WEMs. We have two observations. First, there is a significant variation in the bias of word embeddings with the dimensionality change. Second, there is no uniformity in how the dimensionality change affects the bias of word embeddings. These factors should be considered while selecting the dimensionality of word embeddings.
- Semantics derived automatically from language corpora contain human-like biases. Science 356, 6334 (2017), 183–186.
- AllenNLP: A Deep Semantic Natural Language Processing Platform. In Proceedings of Workshop for NLP Open Source Software (NLP-OSS). Melbourne, Australia, 1–6.
- Measuring individual differences in implicit cognition: the implicit association test. Journal of personality and social psychology 74, 6 (1998), 1464–1480.
- Quantifying social biases in contextual word representations. In 1st ACL Workshop on Gender Bias for Natural Language Processing. 166–172.