Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Private, fair and accurate: Training large-scale, privacy-preserving AI models in medical imaging (2302.01622v5)

Published 3 Feb 2023 in eess.IV, cs.AI, cs.CR, cs.CV, and cs.LG

Abstract: AI models are increasingly used in the medical domain. However, as medical data is highly sensitive, special precautions to ensure its protection are required. The gold standard for privacy preservation is the introduction of differential privacy (DP) to model training. Prior work indicates that DP has negative implications on model accuracy and fairness, which are unacceptable in medicine and represent a main barrier to the widespread use of privacy-preserving techniques. In this work, we evaluated the effect of privacy-preserving training of AI models regarding accuracy and fairness compared to non-private training. For this, we used two datasets: (1) A large dataset (N=193,311) of high quality clinical chest radiographs, and (2) a dataset (N=1,625) of 3D abdominal computed tomography (CT) images, with the task of classifying the presence of pancreatic ductal adenocarcinoma (PDAC). Both were retrospectively collected and manually labeled by experienced radiologists. We then compared non-private deep convolutional neural networks (CNNs) and privacy-preserving (DP) models with respect to privacy-utility trade-offs measured as area under the receiver-operator-characteristic curve (AUROC), and privacy-fairness trade-offs, measured as Pearson's r or Statistical Parity Difference. We found that, while the privacy-preserving trainings yielded lower accuracy, they did largely not amplify discrimination against age, sex or co-morbidity. Our study shows that -- under the challenging realistic circumstances of a real-life clinical dataset -- the privacy-preserving training of diagnostic deep learning models is possible with excellent diagnostic accuracy and fairness.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. Adversarial interference and its mitigations in privacy-preserving collaborative machine learning. Nature Machine Intelligence, 3(9):749–758, 2021.
  2. Federated optimization: Distributed machine learning for on-device intelligence. arXiv preprint arXiv:1610.02527, 2016.
  3. Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492, 2016.
  4. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pages 1273–1282. PMLR, 2017.
  5. Encrypted federated learning for secure decentralized collaboration in cancer image analysis. medRxiv, pages 2022–07, 2022.
  6. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci., 9(3-4):211–407, 2014.
  7. When the curious abandon honesty: Federated learning is not private. arXiv preprint arXiv:2112.02918, 2021.
  8. Robbing the fed: Directly obtaining private data in federated learning with modified models. arXiv preprint arXiv:2110.13057, 2021.
  9. Variational model inversion attacks. Advances in Neural Information Processing Systems, 34:9706–9719, 2021.
  10. Reconstructing training data from trained neural networks. Advances in Neural Information Processing Systems, 35:22911–22924, 2022.
  11. Extracting training data from diffusion models. arXiv preprint arXiv:2301.13188, 2023.
  12. Artificial intelligence and machine learning (ai/ml)-enabled medical devices. https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-and-machine-learning-aiml-enabled-medical-devices.
  13. A statistical framework for differential privacy. Journal of the American Statistical Association, 105(489):375–389, 2010.
  14. Gaussian differential privacy. Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(1):3–37, 2022.
  15. Bounding data reconstruction attacks with the hypothesis testing interpretation of differential privacy. arXiv preprint arXiv:2307.03928, 2023.
  16. Tight auditing of differentially private machine learning. arXiv preprint arXiv:2302.07956, 2023.
  17. End-to-end privacy preserving deep learning on multi-institutional medical imaging. Nature Machine Intelligence, 3(6):473–484, 2021.
  18. Bounding training data reconstruction in dp-sgd. arXiv preprint arXiv:2302.07225, 2023.
  19. Reconstructing training data with informed adversaries. In 2022 IEEE Symposium on Security and Privacy (SP), pages 1138–1156. IEEE, 2022.
  20. Towards formalizing the gdpr’s notion of singling out. Proceedings of the National Academy of Sciences, 117(15):8344–8352, 2020.
  21. Aloni Cohen. Attacks on deidentification’s defenses. In 31st USENIX Security Symposium (USENIX Security 22), pages 1469–1486, 2022.
  22. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pages 308–318, 2016.
  23. Do gradient inversion attacks make federated learning unsafe? IEEE Transactions on Medical Imaging, 2023.
  24. Cynthia Dwork. A firm foundation for private data analysis. Communications of the ACM, 54(1):86–95, 2011.
  25. Unlocking high-accuracy differentially private image classification through scale. arXiv preprint arXiv:2204.13650, 2022.
  26. Toward training at imagenet scale with differential privacy. arXiv preprint arXiv:2201.12328, 2022.
  27. Decision making with differential privacy under a fairness lens. In IJCAI, pages 560–566, 2021.
  28. On the compatibility of privacy and fairness. In Adjunct Publication of the 27th Conference on User Modeling, Adaptation and Personalization, pages 309–315, 2019.
  29. Neither private nor fair: Impact of data imbalance on utility and fairness in differential privacy. In Proceedings of the 2020 workshop on privacy-preserving machine learning in practice, pages 15–19, 2020.
  30. Differential privacy has disparate impact on model accuracy. Advances in neural information processing systems, 32, 2019.
  31. Deep learning-based patient re-identification is able to exploit the biometric nature of medical chest x-ray data. Scientific Reports, 12(1):14851, 2022.
  32. Robust de-anonymization of large sparse datasets. In 2008 IEEE Symposium on Security and Privacy (sp 2008), pages 111–125. IEEE, 2008.
  33. Privacy-preserving federated brain tumour segmentation. In Machine Learning in Medical Imaging: 10th International Workshop, MLMI 2019, Held in Conjunction with MICCAI 2019, Shenzhen, China, October 13, 2019, Proceedings 10, pages 133–141. Springer, 2019.
  34. Defending against reconstruction attacks through differentially private federated learning for classification of heterogeneous chest x-ray data. Sensors, 22(14):5195, 2022.
  35. Artificial intelligence for clinical interpretation of bedside chest radiographs. Radiology, page 220510, 2022.
  36. Collaborative training of medical artificial intelligence models with non-uniform labels. Scientific Reports, 13(1):6046, 2023.
  37. Mimic-cxr, a de-identified publicly available database of chest radiographs with free-text reports. Scientific data, 6(1):317, 2019.
  38. Differentially private training of residual networks with scale normalisation. Theory and Practice of Differential Privacy Workshop, ICML, 2022.
  39. Reinventing 2d convolutions for 3d images. IEEE Journal of Biomedical and Health Informatics, 25(8):3009–3018, 2021.
  40. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  41. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, pages 448–456. pmlr, 2015.
  42. Group normalization. In Proceedings of the European conference on computer vision (ECCV), pages 3–19, 2018.
  43. Mimic-cxr-jpg, a large publicly available database of labeled chest radiographs. arXiv preprint arXiv:1901.07042, 2019.
  44. Abien Fred Agarap. Deep learning using rectified linear units (relu). arXiv preprint arXiv:1803.08375, 2018.
  45. Timothy Dozat. Incorporating nesterov momentum into adam. 2016.
  46. Diganta Misra. Mish: A self regularized non-monotonic activation function. arXiv preprint arXiv:1908.08681, 2019.
  47. Bootstrapping and permuting paired t-test type statistics. Statistics and Computing, 24:283–296, 2014.
  48. Ilker Unal. Defining an optimal cut-point value in roc analysis: an alternative approach. Computational and mathematical methods in medicine, 2017, 2017.
  49. Three naive bayes approaches for discrimination-free classification. Data mining and knowledge discovery, 21:277–292, 2010.
  50. A survey on bias and fairness in machine learning. ACM Computing Surveys (CSUR), 54(6):1–35, 2021.
  51. Preserving privacy in domain transfer of medical ai models comes at no performance costs: The integral role of differential privacy. arXiv preprint arXiv:2306.06503, 2023.
  52. Comparison of chest radiograph interpretations by artificial intelligence algorithm vs radiology residents. JAMA network open, 3(10):e2022779–e2022779, 2020.
  53. Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations. Nature medicine, 27(12):2176–2182, 2021.
  54. Opacus: User-friendly differential privacy library in pytorch, 2021.
  55. Airogs: Artificial intelligence for robust glaucoma screening challenge. arXiv preprint arXiv:2302.01738, 2023.
  56. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning, pages 6105–6114, 2019.
  57. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4700–4708, 2017.
  58. Elevating fundoscopic evaluation to expert level - automatic glaucoma detection using data from the airogs challenge. In 2022 IEEE International Symposium on Biomedical Imaging Challenges (ISBIC), pages 1–4, 2022.
  59. Gardnet: Robust multi-view network for glaucoma classification in color fundus images. In Bhavna Antony, Huazhu Fu, Cecilia S. Lee, Tom MacGillivray, Yanwu Xu, and Yalin Zheng, editors, Ophthalmic Medical Image Analysis, pages 152–161, Cham, 2022. Springer International Publishing.
  60. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Soroosh Tayebi Arasteh (23 papers)
  2. Alexander Ziller (26 papers)
  3. Christiane Kuhl (22 papers)
  4. Marcus Makowski (9 papers)
  5. Sven Nebelung (23 papers)
  6. Rickmer Braren (34 papers)
  7. Daniel Rueckert (335 papers)
  8. Daniel Truhn (51 papers)
  9. Georgios Kaissis (79 papers)
Citations (12)

Summary

Private, Fair and Accurate: Training Large-Scale, Privacy-Preserving AI Models in Medical Imaging

The paper "Private, Fair and Accurate: Training Large-Scale, Privacy-Preserving AI Models in Medical Imaging" explores the intricacies of developing AI models that maintain high diagnostic accuracy while ensuring patient data privacy and fairness across demographic groups. This research stands at the intersection of AI, privacy, and medical ethics, focusing on differential privacy (DP) mechanisms for model training.

Summary of Research

The authors assess the impact of differential privacy (DP) on the utility and fairness of AI models designed for medical imaging, specifically radiograph diagnosis. Leveraging two distinct datasets—one comprising 193,311 high-quality chest radiographs and the other containing 1,625 3D abdominal CT images—they evaluate the performance of DP-trained models against non-privacy-preserving variants.

The main metrics employed for this comparison are the Area Under the Receiver-Operator-Characteristic Curve (AUROC) for privacy-utility trade-offs and Pearson's r or Statistical Parity Difference for privacy-fairness trade-offs. The authors emphasize that although DP models generally show a slight reduction in utility, they do not exacerbate discrimination against subgroups based on age, sex, or comorbidity.

Key Findings

  1. Utility of DP Models: Despite stringent privacy guarantees, the DP models achieved an AUROC of 87% on the chest radiograph dataset and 95.6% on the 3D CT dataset, with only modest declines compared to their non-DP counterparts. Specifically, at an ε\varepsilon value as low as 0.29, the models maintained an AUROC of 83% and 86.8%, respectively.
  2. Fairness Considerations: The analysis shows that privacy-preserving models do not introduce significant fairness concerns. For instance, younger patients, who typically constitute a smaller portion of the dataset, did not face amplified discrimination. In fact, they slightly benefited from higher privacy levels, demonstrated by improved fairness metrics.
  3. Correlation with Population Subsets: Both non-private and DP models showed a positive correlation between diagnostic performance and the sample size of specific conditions. This indicates the need for sufficient data representation across all diagnoses to mitigate performance dips under DP constraints.
  4. Implications on Age and Comorbidity: The results suggest that older patients and those with higher comorbidities tend to pose a more significant challenge for both DP and non-DP models. However, the drop-off in performance for these subgroups remained consistent regardless of the privacy setting, suggesting robustness in the fairness of DP models.

Methodological Insights

Differential privacy was implemented by introducing Gaussian noise to the gradients during training, a technique that bounds the contribution of any single data point. The paper utilized ResNet9 architecture, modified for compatibility with DP (substituting batch normalization with group normalization), and performed pretraining on public datasets to improve initial model performance.

The authors carefully balanced technical rigor with empirical relevance by not solely relying on synthetic or overly curated datasets. Instead, they worked with real-world clinical data, which inherently includes noise and variability, thus providing a more reliable benchmark for their conclusions.

Implications and Future Directions

This work provides compelling evidence that highly accurate and fair diagnostic AI models can be trained under rigorous privacy constraints. The practical implications are vast, particularly for medical institutions concerned with data governance and compliance with legal frameworks like GDPR.

Theoretically, the findings extend the understanding of privacy-utility and privacy-fairness trade-offs in AI, reinforcing that privacy-preserving methods can offer robust safeguards without severely compromising model performance.

Future research might explore more diverse datasets, including other medical conditions and imaging modalities, to generalize these findings further. Additionally, investigating advanced model architectures and alternative privacy-preserving techniques might yield even better performance metrics.

Conclusion

This paper effectively demonstrates that the integration of differential privacy into the training of large-scale AI models for medical imaging does not necessitate a binary choice between utility and privacy. Through careful methodology and rigorous analysis, the authors provide a detailed view into how privacy-preserving models can achieve high accuracy and maintain fairness across various patient demographics, thereby paving the way for broader adoption in clinical settings.