Unlocking High-Accuracy Differentially Private Image Classification through Scale (2204.13650v2)

Published 28 Apr 2022 in cs.LG, cs.CR, cs.CV, and stat.ML

Abstract: Differential Privacy (DP) provides a formal privacy guarantee preventing adversaries with access to a machine learning model from extracting information about individual training points. Differentially Private Stochastic Gradient Descent (DP-SGD), the most popular DP training method for deep learning, realizes this protection by injecting noise during training. However previous works have found that DP-SGD often leads to a significant degradation in performance on standard image classification benchmarks. Furthermore, some authors have postulated that DP-SGD inherently performs poorly on large models, since the norm of the noise required to preserve privacy is proportional to the model dimension. In contrast, we demonstrate that DP-SGD on over-parameterized models can perform significantly better than previously thought. Combining careful hyper-parameter tuning with simple techniques to ensure signal propagation and improve the convergence rate, we obtain a new SOTA without extra data on CIFAR-10 of 81.4% under (8, 10^{-5})-DP using a 40-layer Wide-ResNet, improving over the previous SOTA of 71.7%. When fine-tuning a pre-trained NFNet-F3, we achieve a remarkable 83.8% top-1 accuracy on ImageNet under (0.5, 8*10^{-7})-DP. Additionally, we also achieve 86.7% top-1 accuracy under (8, 8 \cdot 10^{-7})-DP, which is just 4.3% below the current non-private SOTA for this task. We believe our results are a significant step towards closing the accuracy gap between private and non-private image classification.

PDF Abstract

Unlocking High-Accuracy Differentially Private Image Classification through Scale

The paper under review presents a comprehensive paper on improving the accuracy of image classification tasks under the constraints of differential privacy (DP). Differential privacy is a framework that ensures individual data points remain confidential even when models are accessed by adversaries. This is crucial for training on sensitive datasets.

Differentially Private Stochastic Gradient Descent (DP-SGD) Challenges

The paper focuses on DP-SGD, a method that incorporates privacy through noise addition during model optimization. Historically, DP-SGD's efficacy has been challenged, with significant performance degradation noted in standard image classification tasks like CIFAR-10 and ImageNet. Key barriers include the need for strong noise levels that increase with model size, hampering accuracy in larger models. Moreover, optimal hyperparameter tuning, such as batch size and learning rate, is critical for balancing performance and privacy guarantees.

Main Contributions

Enhanced Hyper-parameter Tuning and Architectures: The paper demonstrates that with precise hyperparameter tuning and architectural modifications, including techniques like augmentation multiplicity, large batch sizes, group normalization, and parameter averaging, DP-SGD performance can improve manifold. Specifically, a top-1 accuracy of 81.4% on CIFAR-10 with $(8, 10^{-5})$ -DP was achieved using a 40-layer Wide-ResNet. This outperforms the previous best of 71.7%, showcasing significant progress.
Pre-trained Model Fine-tuning: Exploiting pre-training on large non-sensitive datasets followed by fine-tuning on private ones with DP-SGD leads to substantial gains. Remarkably, a pre-trained NFNet-F3 model fine-tuned on ImageNet achieved an 83.8% top-1 accuracy under $(0.5, 8 \cdot 10^{-7})$ -DP. This bridges the gap with non-private training results, marking a milestone in DP-respecting model training.
Optimization Insights: Experiments reveal that optimal DP-SGD performance correlates with specific configurations based on privacy budgets, batch sizes, and model depths. For example, while larger batch sizes generally require higher computation, they also enhance performance significantly.
Demonstration on Large Datasets: The results extend beyond CIFAR-10 to ImageNet and Places-365, affirming the approach's scalability and robustness. On ImageNet, even from a random initialization, DP-SGD trained models reach notable accuracy increments.

Implications and Future Work

The findings illustrate that standard over-parameterized models, once considered unsuitable under DP constraints, can match or even exceed the efficacy of tailored DP architectures. This has practical implications for deploying privacy-conscious models across various domains without sacrificing substantial accuracy. Furthermore, as the work demonstrates, the integration of pre-trained models as a precursor to DP fine-tuning could be a pioneering direction in privacy-aware machine learning.

Future research could explore more sophisticated pre-training schemes and architectures, perhaps leveraging emerging model designs like transformers in vision tasks with DP. The paper’s methodologies and results potentiate a roadmap for developing high-accuracy, DP-compliant models, ushering the community towards a future where privacy does not inherently compromise model performance.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Soham De (38 papers)
Leonard Berrada (14 papers)
Jamie Hayes (47 papers)
Samuel L. Smith (27 papers)
Borja Balle (54 papers)

Citations (189)

View on Semantic Scholar

Unlocking High-Accuracy Differentially Private Image Classification through Scale (2204.13650v2)