Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Gender bias in (non)-contextual clinical word embeddings for stereotypical medical categories (2208.01341v2)

Published 2 Aug 2022 in cs.CL and cs.CY

Abstract: Clinical word embeddings are extensively used in various Bio-NLP problems as a state-of-the-art feature vector representation. Although they are quite successful at the semantic representation of words, due to the dataset - which potentially carries statistical and societal bias - on which they are trained, they might exhibit gender stereotypes. This study analyses gender bias of clinical embeddings on three medical categories: mental disorders, sexually transmitted diseases, and personality traits. To this extent, we analyze two different pre-trained embeddings namely (contextualized) clinical-BERT and (non-contextualized) BioWordVec. We show that both embeddings are biased towards sensitive gender groups but BioWordVec exhibits a higher bias than clinical-BERT for all three categories. Moreover, our analyses show that clinical embeddings carry a high degree of bias for some medical terms and diseases which is conflicting with medical literature. Having such an ill-founded relationship might cause harm in downstream applications that use clinical embeddings.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Fabian Mijsters (1 paper)
  2. Amar van Uden (1 paper)
  3. Jelle Peperzak (1 paper)
  4. Gizem Sogancioglu (3 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.