Private, fair and accurate: Training large-scale, privacy-preserving AI models in medical imaging (2302.01622v5)

Published 3 Feb 2023 in eess.IV, cs.AI, cs.CR, cs.CV, and cs.LG

Abstract: AI models are increasingly used in the medical domain. However, as medical data is highly sensitive, special precautions to ensure its protection are required. The gold standard for privacy preservation is the introduction of differential privacy (DP) to model training. Prior work indicates that DP has negative implications on model accuracy and fairness, which are unacceptable in medicine and represent a main barrier to the widespread use of privacy-preserving techniques. In this work, we evaluated the effect of privacy-preserving training of AI models regarding accuracy and fairness compared to non-private training. For this, we used two datasets: (1) A large dataset (N=193,311) of high quality clinical chest radiographs, and (2) a dataset (N=1,625) of 3D abdominal computed tomography (CT) images, with the task of classifying the presence of pancreatic ductal adenocarcinoma (PDAC). Both were retrospectively collected and manually labeled by experienced radiologists. We then compared non-private deep convolutional neural networks (CNNs) and privacy-preserving (DP) models with respect to privacy-utility trade-offs measured as area under the receiver-operator-characteristic curve (AUROC), and privacy-fairness trade-offs, measured as Pearson's r or Statistical Parity Difference. We found that, while the privacy-preserving trainings yielded lower accuracy, they did largely not amplify discrimination against age, sex or co-morbidity. Our study shows that -- under the challenging realistic circumstances of a real-life clinical dataset -- the privacy-preserving training of diagnostic deep learning models is possible with excellent diagnostic accuracy and fairness.

References (60)

Authors (9)

Soroosh Tayebi Arasteh (23 papers)
Alexander Ziller (26 papers)
Christiane Kuhl (22 papers)
Marcus Makowski (9 papers)
Sven Nebelung (23 papers)
Rickmer Braren (34 papers)
Daniel Rueckert (335 papers)
Daniel Truhn (51 papers)
Georgios Kaissis (79 papers)

Citations (12)

View on Semantic Scholar

Summary

Private, Fair and Accurate: Training Large-Scale, Privacy-Preserving AI Models in Medical Imaging

The paper "Private, Fair and Accurate: Training Large-Scale, Privacy-Preserving AI Models in Medical Imaging" explores the intricacies of developing AI models that maintain high diagnostic accuracy while ensuring patient data privacy and fairness across demographic groups. This research stands at the intersection of AI, privacy, and medical ethics, focusing on differential privacy (DP) mechanisms for model training.

Summary of Research

The authors assess the impact of differential privacy (DP) on the utility and fairness of AI models designed for medical imaging, specifically radiograph diagnosis. Leveraging two distinct datasets—one comprising 193,311 high-quality chest radiographs and the other containing 1,625 3D abdominal CT images—they evaluate the performance of DP-trained models against non-privacy-preserving variants.

The main metrics employed for this comparison are the Area Under the Receiver-Operator-Characteristic Curve (AUROC) for privacy-utility trade-offs and Pearson's r or Statistical Parity Difference for privacy-fairness trade-offs. The authors emphasize that although DP models generally show a slight reduction in utility, they do not exacerbate discrimination against subgroups based on age, sex, or comorbidity.

Key Findings

Utility of DP Models: Despite stringent privacy guarantees, the DP models achieved an AUROC of 87% on the chest radiograph dataset and 95.6% on the 3D CT dataset, with only modest declines compared to their non-DP counterparts. Specifically, at an $\varepsilon$ value as low as 0.29, the models maintained an AUROC of 83% and 86.8%, respectively.
Fairness Considerations: The analysis shows that privacy-preserving models do not introduce significant fairness concerns. For instance, younger patients, who typically constitute a smaller portion of the dataset, did not face amplified discrimination. In fact, they slightly benefited from higher privacy levels, demonstrated by improved fairness metrics.
Correlation with Population Subsets: Both non-private and DP models showed a positive correlation between diagnostic performance and the sample size of specific conditions. This indicates the need for sufficient data representation across all diagnoses to mitigate performance dips under DP constraints.
Implications on Age and Comorbidity: The results suggest that older patients and those with higher comorbidities tend to pose a more significant challenge for both DP and non-DP models. However, the drop-off in performance for these subgroups remained consistent regardless of the privacy setting, suggesting robustness in the fairness of DP models.

Methodological Insights

Differential privacy was implemented by introducing Gaussian noise to the gradients during training, a technique that bounds the contribution of any single data point. The paper utilized ResNet9 architecture, modified for compatibility with DP (substituting batch normalization with group normalization), and performed pretraining on public datasets to improve initial model performance.

The authors carefully balanced technical rigor with empirical relevance by not solely relying on synthetic or overly curated datasets. Instead, they worked with real-world clinical data, which inherently includes noise and variability, thus providing a more reliable benchmark for their conclusions.

Implications and Future Directions

This work provides compelling evidence that highly accurate and fair diagnostic AI models can be trained under rigorous privacy constraints. The practical implications are vast, particularly for medical institutions concerned with data governance and compliance with legal frameworks like GDPR.

Theoretically, the findings extend the understanding of privacy-utility and privacy-fairness trade-offs in AI, reinforcing that privacy-preserving methods can offer robust safeguards without severely compromising model performance.

Future research might explore more diverse datasets, including other medical conditions and imaging modalities, to generalize these findings further. Additionally, investigating advanced model architectures and alternative privacy-preserving techniques might yield even better performance metrics.

Conclusion

This paper effectively demonstrates that the integration of differential privacy into the training of large-scale AI models for medical imaging does not necessitate a binary choice between utility and privacy. Through careful methodology and rigorous analysis, the authors provide a detailed view into how privacy-preserving models can achieve high accuracy and maintain fairness across various patient demographics, thereby paving the way for broader adoption in clinical settings.

PDF Markdown