Dopamine: Differentially Private Federated Learning on Medical Data (2101.11693v2)

Published 27 Jan 2021 in cs.LG, cs.CR, and cs.DC

Abstract: While rich medical datasets are hosted in hospitals distributed across the world, concerns on patients' privacy is a barrier against using such data to train deep neural networks (DNNs) for medical diagnostics. We propose Dopamine, a system to train DNNs on distributed datasets, which employs federated learning (FL) with differentially-private stochastic gradient descent (DPSGD), and, in combination with secure aggregation, can establish a better trade-off between differential privacy (DP) guarantee and DNN's accuracy than other approaches. Results on a diabetic retinopathy~(DR) task show that Dopamine provides a DP guarantee close to the centralized training counterpart, while achieving a better classification accuracy than FL with parallel DP where DPSGD is applied without coordination. Code is available at https://github.com/ipc-lab/private-ml-for-health.

Citations (46)

View on Semantic Scholar

Summary

The paper introduces Dopamine, which integrates DPSGD with secure aggregation to enforce near-centralized differential privacy in federated learning.
It employs Gaussian noise and homomorphic encryption to balance privacy with model utility, yielding improved accuracy on diabetic retinopathy data.
The study demonstrates that multi-hospital collaborations can safeguard sensitive patient data while maintaining robust AI performance.

An Expert Review of "Dopamine: Differentially Private Secure Federated Learning on Medical Data"

The paper "Dopamine: Differentially Private Secure Federated Learning on Medical Data," authored by Malekzadeh et al., introduces an innovative methodology aimed at the deployment of secure federated learning (FL) for medical data while ensuring differential privacy (DP). Given the sensitive nature of medical data and the legal constraints surrounding its use, the paper addresses a crucial challenge in the domain of privacy-preserving artificial intelligence.

Methodological Overview

The authors propose a system named Dopamine that fuses differentially-private stochastic gradient descent (DPSGD) with secure aggregation mechanisms tailored for federated learning scenarios. The focus is on effectively balancing the trade-off between the utility of the trained deep neural networks (DNNs) and the privacy guarantees provided to the data subjects, in this case, patients.

Key components of the methodology involve:

Federated Learning Framework: Hospitals collaborate with a central server to train a global DNN model without sharing individual patient data.
Differential Privacy Mechanism: Gaussian noise is added to computed gradients at the hospital level to limit privacy risks, in combination with secure aggregation to protect model updates from direct inspection by the central server.
Secure Aggregation: Utilization of homomorphic encryption techniques to facilitate secure computations over encrypted data, ensuring that individual model updates remain hidden from the central entity.

Experimental Setup and Results

The experimental results leverage a diabetic retinopathy (DR) dataset, employing SqueezeNet as the reference DNN model. The paper effectively demonstrates that Dopamine can provide a DP guarantee that closely mirrors centralized DP (when datasets are fully centralized for processing), while achieving superior classification accuracy compared to traditional FL approaches with uncoordinated DPSGD.

The results highlight that Dopamine attains:

A close-to-centralized DP bound.
Improved classification accuracy over existing FL approaches where parallel DP without coordination leads to suboptimal training dynamics.

Theoretical and Practical Implications

The paper presents a significant stride in integrating DP within FL frameworks, particularly in contexts dealing with medical datasets where privacy is paramount. By leveraging secure aggregation, hospitals can afford lesser noise addition while satisfying privacy constraints, thereby enhancing model utility.

Theoretically, the approach effectively addresses privacy loss estimation challenges inherent in FL setups by employing a moments accountant framework. Practically, this enhances the feasibility of deploying privacy-aware machine learning models in real-world healthcare applications.

Future Research Directions

The paper opens several avenues for future exploration:

Scalability and Generalization: Extending techniques to larger and more diverse datasets beyond diabetic retinopathy.
Enhanced Encryption Techniques: Exploration of alternative secure aggregation methodologies that could reduce computational overheads or improve encryption robustness.
Multi-Institutional Collaboration: Evaluating performance in truly distributed multi-institutional setups where data heterogeneity is a factor.

Conclusion

"Dopamine: Differentially Private Secure Federated Learning on Medical Data" presents a compelling approach towards enhancing privacy in federated learning applications within the medical domain. By combining differential privacy with secure aggregation, the paper not only advances theoretical understanding but also provides practical insights of potential interest to both AI and healthcare professionals. It stands as a promising foundation for subsequent research and applied innovations in privacy-preserving AI.

Related Papers

GitHub

GitHub - ipc-lab/private-ml-for-health: Dopamine: Differentially Private Federated Learning on Medical Data (AAAI - PPAI) (67 stars)