Unlocking Accuracy and Fairness in Differentially Private Image Classification (2308.10888v1)

Published 21 Aug 2023 in cs.LG, cs.CV, and cs.CY

Abstract: Privacy-preserving machine learning aims to train models on private data without leaking sensitive information. Differential privacy (DP) is considered the gold standard framework for privacy-preserving training, as it provides formal privacy guarantees. However, compared to their non-private counterparts, models trained with DP often have significantly reduced accuracy. Private classifiers are also believed to exhibit larger performance disparities across subpopulations, raising fairness concerns. The poor performance of classifiers trained with DP has prevented the widespread adoption of privacy preserving machine learning in industry. Here we show that pre-trained foundation models fine-tuned with DP can achieve similar accuracy to non-private classifiers, even in the presence of significant distribution shifts between pre-training data and downstream tasks. We achieve private accuracies within a few percent of the non-private state of the art across four datasets, including two medical imaging benchmarks. Furthermore, our private medical classifiers do not exhibit larger performance disparities across demographic groups than non-private models. This milestone to make DP training a practical and reliable technology has the potential to widely enable machine learning practitioners to train safely on sensitive datasets while protecting individuals' privacy.

Authors (9)

Leonard Berrada (14 papers)
Soham De (38 papers)
Judy Hanwen Shen (21 papers)
Jamie Hayes (47 papers)
Robert Stanforth (18 papers)
David Stutz (24 papers)
Pushmeet Kohli (116 papers)
Samuel L. Smith (27 papers)
Borja Balle (54 papers)

Citations (12)

View on Semantic Scholar

Summary

Insights on Fine-Tuning Pre-Trained Models with Differential Privacy

The paper "Unlocking Accuracy and Fairness in Differentially Private Image Classification" addresses a critical shortcoming in the application of differential privacy (DP) to machine learning: the trade-off between privacy and model accuracy. It pushes the limits of differentially private training in deep learning, achieving a balance between privacy guarantees and the accuracy of models, specifically in the context of image classification tasks. Herein, we examine the contributions and implications of the paper.

Summary and Technical Contributions

Pre-trained Models with DP Fine-Tuning: The paper demonstrates that using large, pre-trained neural network models considerably aids in overcoming the accuracy deficit seen in DP-trained models. By fine-tuning these pre-trained models using DP-SGD, the authors report accuracy levels that are close to non-private baselines across multiple datasets, including challenging medical imaging benchmarks.
Performance Across Diverse Datasets: Fine-tuning with DP achieves near state-of-the-art accuracy across four datasets, the highlights being in medical imaging where privacy concerns are significant. Specifically, on datasets such as CheXpert and MIMIC-CXR, the authors obtain private accuracies that closely align with non-private model accuracies.
Fairness Analysis: Contrary to the concerns that DP exacerbates performance disparities across demographic groups, the paper provides evidence that differentially private models can perform on par with non-private counterparts in terms of fairness. The paper rigorously analyzes AUC disparities and finds no significant fairness deterioration in private models.
Implications for Deployment and Private Training: By aligning with the paradigms used in industry, such as utilizing large foundation models pre-trained on public data, this work suggests that DP is ready for wider industrial and governmental adoption. The methodology provides machine learning practitioners the capability to train models with necessary privacy guarantees, while retaining high levels of accuracy.

Major Results

For the CheXpert dataset, the fine-tuned model attains an AUC of 89.24% with differential privacy at $\varepsilon = 8$ , against the best non-private model AUC of 93.0%.
On ImageNet, leveraging an NFNet-F7+ model pre-trained on JFT, a top-1 accuracy of 88.5% was achieved under $\varepsilon = 8$ .
Place-365 further exemplifies the strength of private models, reaching an accuracy of 58.2% under the same privacy budget, which closely follows the non-private state-of-the-art.

Broader Implications

The implications of this work are profound in both theoretical and practical arenas of machine learning:

Theoretical Implications: The research challenges existing assumptions about the feasibility of DP in high-dimensional models and paves the way for further exploration into applying DP to other complex tasks and architectures.
Practical Applications: The ability to deploy privacy-preserving techniques without sacrificing model performance is crucial for sectors such as healthcare, where protecting sensitive information is vital. This work could lead to broader acceptance and use of differential privacy in real-world deployments.

Future Directions

While the research provides strong evidence supporting the practical application of DP in high-dimensional spaces, exploration into more areas remains essential. Potential future work could investigate further reducing the gap between private and non-private model performance, enhancing computational efficiency, and exploring other modalities like audio or video with DP constraints. The impact on transfer learning between significantly different domains under privacy constraints remains a promising and uncharted area for development.

In conclusion, this research underscores a significant stride in reconciling privacy with performance in machine learning models — an endeavor that continues to be of paramount importance in ensuring ethical and responsible AI deployments.

PDF Markdown

Related Papers

Find Related Papers

GitHub

GitHub - google-deepmind/jax_privacy: Algorithms for Privacy-Preserving Machine Learning in JAX (94 stars)

Tweets

https://twitter.com/judyhshen/status/1749835646984966646

https://twitter.com/PandaAshwinee/status/1759609957417468129

https://twitter.com/LeonardBerrada/status/1749889569963532546