Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Comprehensive Study of Gender Bias in Chemical Named Entity Recognition Models (2212.12799v2)

Published 24 Dec 2022 in cs.CL

Abstract: Chemical named entity recognition (NER) models are used in many downstream tasks, from adverse drug reaction identification to pharmacoepidemiology. However, it is unknown whether these models work the same for everyone. Performance disparities can potentially cause harm rather than the intended good. This paper assesses gender-related performance disparities in chemical NER systems. We develop a framework for measuring gender bias in chemical NER models using synthetic data and a newly annotated corpus of over 92,405 words with self-identified gender information from Reddit. Our evaluation of multiple biomedical NER models reveals evident biases. For instance, synthetic data suggests female-related names are frequently misclassified as chemicals, especially for brand name mentions. Additionally, we observe performance disparities between female- and male-associated data in both datasets. Many systems fail to detect contraceptives such as birth control. Our findings emphasize the biases in chemical NER models, urging practitioners to account for these biases in downstream applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (62)
  1. Pankaj Agarwal and David B. Searls. 2008. Literature mining in support of drug discovery. Briefings in bioinformatics, 9 6:479–92.
  2. Flair: An easy-to-use framework for state-of-the-art nlp. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), pages 54–59.
  3. Pooled contextualized embeddings for named entity recognition. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 724–728.
  4. Contextual string embeddings for sequence labeling. In COLING 2018, 27th International Conference on Computational Linguistics, pages 1638–1649.
  5. Stereotypical bias removal for hate speech detection task using knowledge-based generalizations. In The World Wide Web Conference, pages 49–59.
  6. James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization. Journal of machine learning research, 13(2).
  7. Olivier Bodenreider. 2004. The unified medical language system (umls): integrating biomedical terminology. Nucleic acids research, 32(suppl_1):D267–D270.
  8. Elisa Chilet-Rosell. 2014. Gender bias in clinical research, pharmaceutical marketing, and the prescription of drugs. Global Health Action, 7(1):25484.
  9. Recognizing chemical entity in biomedical literature using a bert-based ensemble learning methods for the biocreative 2021 nlm-chem track. In Proceedings of the seventh BioCreative challenge evaluation workshop.
  10. Sex and gender differences and biases in artificial intelligence for biomedicine and healthcare. NPJ digital medicine, 3(1):81.
  11. Pmc text mining subset in bioc: about three million full-text articles and growing. Bioinformatics, 35(18):3533–3535.
  12. Quantifying social biases in nlp: A generalization and empirical comparison of extrinsic fairness metrics. Transactions of the Association for Computational Linguistics, 9:1249–1267.
  13. Racial bias in hate speech and abusive language detection datasets. In Proceedings of the Third Workshop on Abusive Language Online, pages 25–35, Florence, Italy. Association for Computational Linguistics.
  14. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186.
  15. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
  16. Measuring and mitigating unintended bias in text classification. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, pages 67–73.
  17. Lizzy Farrugia and Charlie Abela. 2020. Mining drug-drug interactions for healthcare professionals. Proceedings of the 3rd International Conference on Applications of Intelligent Systems.
  18. Joel Escudé Font and Marta R Costa-jussà. 2019. Equalizing gender bias in neural machine translation with word embeddings techniques. In Proceedings of the First Workshop on Gender Bias in Natural Language Processing, pages 147–154.
  19. Towards understanding gender bias in relation extraction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 2943–2953.
  20. Intrinsic bias metrics do not correlate with application bias. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1926–1940.
  21. Evan Hepler-Smith. 2015. “just as the structural formula does”: Names, diagrams, and the structure of organic chemistry at the 1892 geneva nomenclature congress. Ambix, 62(1):1–28.
  22. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, 9(8):1735–1780.
  23. Svetlana Kiritchenko and Saif Mohammad. 2018. Examining gender and race bias in two hundred sentiment analysis systems. In Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics, pages 43–53.
  24. The chemdner corpus of chemicals and drugs and its annotation principles. Journal of cheminformatics, 7(1):1–17.
  25. Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4):1234–1240.
  26. Biocreative v cdr task corpus: a resource for chemical disease relation extraction. Database, 2016.
  27. A survey on deep learning for named entity recognition. IEEE Transactions on Knowledge and Data Engineering, 34(1):50–70.
  28. The effect of brand design on brand gender perceptions and brand preference. European Journal of Marketing.
  29. Combining word embeddings to extract chemical and drug entities in biomedical literature. BMC bioinformatics, 22(1):1–18.
  30. Brandon Lwowski and Anthony Rios. 2021. The risk of racial bias while tracking influenza-related content on social media using machine learning. Journal of the American Medical Informatics Association, 28(4):839–849.
  31. Pharmacovigilance in pharmaceutical companies: An overview. Journal of Pharmacology & Pharmacotherapeutics, 4:S33 – S37.
  32. The experience of symptoms of depression in men vs women: analysis of the national comorbidity survey replication. JAMA psychiatry, 70(10):1100–1106.
  33. Man is to person as woman is to location: Measuring gender bias in named entity recognition. Proceedings of the 31st ACM Conference on Hypertext and Social Media.
  34. Efficient estimation of word representations in vector space. In 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings.
  35. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119.
  36. Assessing demographic bias in named entity recognition. arXiv preprint arXiv:2008.03415.
  37. Hate speech detection and racial bias mitigation in social media based on bert model. PloS one, 15(8):e0237861.
  38. Sources of transfer in multilingual named entity recognition. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8093–8104.
  39. World Health Organization and Key Centre for Women’s Health in Society. 2009. Mental health aspects of women’s reproductive health: a global review of the literature.
  40. Pharmacovigilance on twitter? mining tweets for adverse drug reactions. In AMIA annual symposium proceedings, volume 2014, page 924. American Medical Informatics Association.
  41. Reducing gender bias in abusive language detection. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2799–2804.
  42. James Pustejovsky and Amber Stubbs. 2012. Natural Language Annotation for Machine Learning: A guide to corpus-building for applications. " O’Reilly Media, Inc.".
  43. Distributional semantics resources for biomedical text processing. In Proceedings of LBM 2013, pages 39–44.
  44. Sex differences in the perception of noxious experimental stimuli: a meta-analysis. Pain, 74(2-3):181–187.
  45. Anthony Rios. 2020. Fuzze: Fuzzy fairness evaluation of offensive language classifiers on african-american english. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 881–889.
  46. Chemspot: a hybrid system for chemical named entity recognition. Bioinformatics, 28(12):1633–1640.
  47. Gender differences in depression in representative national samples: Meta-analyses of diagnoses and symptoms. Psychological bulletin, 143(8):783.
  48. The risk of racial bias in hate speech detection. In Proceedings of the 57th Conference of the Association for Computational Linguistics, pages 1668–1678.
  49. Gender differences in prescription opioid use. Current opinion in psychiatry, 30(4):238.
  50. A new corpus to support text mining for the curation of metabolites in the chebi database. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018).
  51. The woman worked as a babysitter: On biases in language generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3407–3412.
  52. Understanding the gender gap in antibiotic prescribing: a cross-sectional analysis of english primary care. BMJ open, 8(2):e020203.
  53. Deep learning with language models improves named entity recognition for pharmaconer. BMC bioinformatics, 22(1):1–16.
  54. Are men under-treated and women over-treated with antidepressants? findings from a cross-sectional survey in sweden. BJPsych bulletin, 41(3):145–150.
  55. Generating complement data for aspect term extraction with gpt-2. In Proceedings of the Third Workshop on Deep Learning for Low-Resource Natural Language Processing, pages 203–213.
  56. Gpt-ner: Named entity recognition via large language models. arXiv preprint arXiv:2304.10428.
  57. Hunflair: an easy-to-use tool for state-of-the-art biomedical named entity recognition. Bioinformatics, 37(17):2792–2794.
  58. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
  59. Identifying adverse drug reaction entities from social media with adversarial transfer learning model. Neurocomputing, 453:254–262.
  60. Gender bias in contextualized word embeddings. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 629–634, Minneapolis, Minnesota. Association for Computational Linguistics.
  61. Men also like shopping: Reducing gender bias amplification using corpus-level constraints. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2979–2989.
  62. Gender bias in coreference resolution: Evaluation and debiasing methods. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 15–20.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com