An Empirical Framework for Domain Generalization in Clinical Settings (2103.11163v2)
Abstract: Clinical machine learning models experience significantly degraded performance in datasets not seen during training, e.g., new hospitals or populations. Recent developments in domain generalization offer a promising solution to this problem by creating models that learn invariances across environments. In this work, we benchmark the performance of eight domain generalization methods on multi-site clinical time series and medical imaging data. We introduce a framework to induce synthetic but realistic domain shifts and sampling bias to stress-test these methods over existing non-healthcare benchmarks. We find that current domain generalization methods do not consistently achieve significant gains in out-of-distribution performance over empirical risk minimization on real-world medical imaging data, in line with prior work on general imaging datasets. However, a subset of realistic induced-shift scenarios in clinical time series data do exhibit limited performance gains. We characterize these scenarios in detail, and recommend best practices for domain generalization in the clinical setting.
- Haoran Zhang (102 papers)
- Natalie Dullerud (10 papers)
- Laleh Seyyed-Kalantari (10 papers)
- Quaid Morris (11 papers)
- Shalmali Joshi (24 papers)
- Marzyeh Ghassemi (96 papers)