Synth-SBDH: A Synthetic Dataset of Social and Behavioral Determinants of Health for Clinical Text (2406.06056v2)
Abstract: Social and behavioral determinants of health (SBDH) play a crucial role in health outcomes and are frequently documented in clinical text. Automatically extracting SBDH information from clinical text relies on publicly available good-quality datasets. However, existing SBDH datasets exhibit substantial limitations in their availability and coverage. In this study, we introduce Synth-SBDH, a novel synthetic dataset with detailed SBDH annotations, encompassing status, temporal information, and rationale across 15 SBDH categories. We showcase the utility of Synth-SBDH on three tasks using real-world clinical datasets from two distinct hospital settings, highlighting its versatility, generalizability, and distillation capabilities. Models trained on Synth-SBDH consistently outperform counterparts with no Synth-SBDH training, achieving up to 63.75% macro-F improvements. Additionally, Synth-SBDH proves effective for rare SBDH categories and under-resource constraints while being substantially cheaper than expert-annotated real-world data. Human evaluation reveals a 71.06% Human-LLM alignment and uncovers areas for future refinements.
- Avijit Mitra (7 papers)
- Emily Druhl (3 papers)
- Raelene Goodwin (2 papers)
- Hong Yu (114 papers)