Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Backdoor Adjustment of Confounding by Provenance for Robust Text Classification of Multi-institutional Clinical Notes (2310.02451v1)

Published 3 Oct 2023 in cs.CL

Abstract: NLP methods have been broadly applied to clinical tasks. Machine learning and deep learning approaches have been used to improve the performance of clinical NLP. However, these approaches require sufficiently large datasets for training, and trained models have been shown to transfer poorly across sites. These issues have led to the promotion of data collection and integration across different institutions for accurate and portable models. However, this can introduce a form of bias called confounding by provenance. When source-specific data distributions differ at deployment, this may harm model performance. To address this issue, we evaluate the utility of backdoor adjustment for text classification in a multi-site dataset of clinical notes annotated for mentions of substance abuse. Using an evaluation framework devised to measure robustness to distributional shifts, we assess the utility of backdoor adjustment. Our results indicate that backdoor adjustment can effectively mitigate for confounding shift.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (10)
  1. Percha B. Modern clinical text mining: a guide and review. Annual review of biomedical data science. 2021 Jul 20;4:165-87.
  2. Guo Y, Li C, Roan C, Pakhomov S, Cohen T. Crossing the “Cookie Theft” corpus chasm: applying what BERT learns from outside data to the ADReSS challenge dementia detection task. Frontiers in Computer Science. 2021 Apr 16;3:642517.
  3. Landeiro V, Culotta A. Robust text classification under confounding shift. Journal of Artificial Intelligence Research. 2018 Nov 5;63:391-419.
  4. Littlestone N. From on-line to batch learning. InProceedings of the second annual workshop on Computational learning theory 2014 Jun 28 (pp. 269-284).
  5. Reimers N, Gurevych I. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084. 2019 Aug 27.
  6. Pearl J. Causal inference in the health sciences: a conceptual introduction. Health services and outcomes research methodology. 2001 Dec;2:189-220.
  7. Pearl J. Causality. Cambridge university press; 2009 Sep 14.
  8. Kazancioğlu R. Risk factors for chronic kidney disease: an update. Kidney international supplements. 2013 Dec 1;3(4):368-71.
  9. Cortes C, Vapnik V. Support-vector networks. Machine learning. 1995 Sep;20:273-97.
  10. Chen T, Guestrin C. Xgboost: A scalable tree boosting system. InProceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining 2016 Aug 13 (pp. 785-794).
Citations (1)

Summary

We haven't generated a summary for this paper yet.