Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SODA: A Natural Language Processing Package to Extract Social Determinants of Health for Cancer Studies (2212.03000v2)

Published 6 Dec 2022 in cs.CL, cs.AI, and cs.LG

Abstract: Objective: We aim to develop an open-source NLP package, SODA (i.e., SOcial DeterminAnts), with pre-trained transformer models to extract social determinants of health (SDoH) for cancer patients, examine the generalizability of SODA to a new disease domain (i.e., opioid use), and evaluate the extraction rate of SDoH using cancer populations. Methods: We identified SDoH categories and attributes and developed an SDoH corpus using clinical notes from a general cancer cohort. We compared four transformer-based NLP models to extract SDoH, examined the generalizability of NLP models to a cohort of patients prescribed with opioids, and explored customization strategies to improve performance. We applied the best NLP model to extract 19 categories of SDoH from the breast (n=7,971), lung (n=11,804), and colorectal cancer (n=6,240) cohorts. Results and Conclusion: We developed a corpus of 629 cancer patients notes with annotations of 13,193 SDoH concepts/attributes from 19 categories of SDoH. The Bidirectional Encoder Representations from Transformers (BERT) model achieved the best strict/lenient F1 scores of 0.9216 and 0.9441 for SDoH concept extraction, 0.9617 and 0.9626 for linking attributes to SDoH concepts. Fine-tuning the NLP models using new annotations from opioid use patients improved the strict/lenient F1 scores from 0.8172/0.8502 to 0.8312/0.8679. The extraction rates among 19 categories of SDoH varied greatly, where 10 SDoH could be extracted from >70% of cancer patients, but 9 SDoH had a low extraction rate (<70% of cancer patients). The SODA package with pre-trained transformer models is publicly available at https://github.com/uf-hobiinformatics-lab/SDoH_SODA.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (15)
  1. Zehao Yu (41 papers)
  2. Xi Yang (160 papers)
  3. Chong Dang (2 papers)
  4. Prakash Adekkanattu (6 papers)
  5. Braja Gopal Patra (4 papers)
  6. Yifan Peng (147 papers)
  7. Jyotishman Pathak (12 papers)
  8. Debbie L. Wilson (1 paper)
  9. Ching-Yuan Chang (1 paper)
  10. Wei-Hsuan Lo-Ciganic (2 papers)
  11. Thomas J. George (3 papers)
  12. Yi Guo (115 papers)
  13. Jiang Bian (229 papers)
  14. Yonghui Wu (115 papers)
  15. William R. Hogan (5 papers)
Citations (8)