Differentially Private Model Publishing for Deep Learning (1904.02200v5)

Published 3 Apr 2019 in cs.CR and cs.LG

Abstract: Deep learning techniques based on neural networks have shown significant success in a wide range of AI tasks. Large-scale training datasets are one of the critical factors for their success. However, when the training datasets are crowdsourced from individuals and contain sensitive information, the model parameters may encode private information and bear the risks of privacy leakage. The recent growing trend of the sharing and publishing of pre-trained models further aggravates such privacy risks. To tackle this problem, we propose a differentially private approach for training neural networks. Our approach includes several new techniques for optimizing both privacy loss and model accuracy. We employ a generalization of differential privacy called concentrated differential privacy(CDP), with both a formal and refined privacy loss analysis on two different data batching methods. We implement a dynamic privacy budget allocator over the course of training to improve model accuracy. Extensive experiments demonstrate that our approach effectively improves privacy loss accounting, training efficiency and model quality under a given privacy budget.

Authors (5)

Lei Yu (234 papers)
Ling Liu (132 papers)
Calton Pu (21 papers)
Mehmet Emre Gursoy (14 papers)
Stacey Truex (14 papers)

Citations (243)

View on Semantic Scholar

Summary

An Overview of Differentially Private Model Publishing for Deep Learning

The paper "Differentially Private Model Publishing for Deep Learning" addresses the growing concern of privacy in deep learning models that are trained on sensitive data. As the deployment of deep learning models becomes more widespread, models often embed sensitive information from their training datasets, posing significant privacy risks through potential leakage. This paper introduces a framework aimed at mitigating these risks while maintaining model utility, employing techniques from differential privacy, specifically focused on optimizing privacy loss and model accuracy.

Core Contributions

The primary contribution of this research is the integration of differential privacy (DP) into the training of deep neural networks (DNNs). This is realized through several novel adaptations to the gradient descent algorithm, which forms the computational backbone of deep learning. The paper's approach ensures privacy through the following mechanisms:

Concentrated Differential Privacy (CDP): The paper leverages a generalization of traditional DP, known as concentrated differential privacy, to provide a refined analysis of privacy loss. This approach is particularly suited for scenarios involving a significant number of computations, as it offers a more accurate account of cumulative privacy loss compared to the classical $(\epsilon, \delta)$ -DP.
Dynamic Privacy Budget Allocation: A novel framework for dynamically adjusting the privacy budget throughout the model training process is proposed. This dynamic allocation allows for improved model accuracy, as it optimizes the trade-off between privacy preservation and utility, as opposed to a static, uniform privacy budget.
Data Batching Impact: The research significantly highlights the impact of different data batching methods on privacy loss. By differentiating between random reshuffling and random sampling, the paper reveals distinct privacy loss consequences and proposes tailored accounting methods for each.

Methodology and Results

The experimental evaluation demonstrates the efficacy of the proposed methods. Extensive experiments were conducted using diverse datasets (e.g., MNIST, CIFAR-10), showing consistent improvements in model accuracy while adhering to predefined privacy budgets. Notably, the dynamic adjustment of privacy parameters outperformed traditional uniform allocations, enhancing the model’s predictive performance across varying computational workloads.

Privacy Loss Analysis: Through the application of CDP, the paper achieves a notable reduction in the privacy loss estimation. This highlights the advantage of using CDP for deep learning tasks requiring numerous iterations, offering a more nuanced understanding of privacy dynamics in learning algorithms.
Model Accuracy Enhancement: Demonstrably, the dynamic privacy budget allocation methods resulted in higher model accuracies, indicating that controlled perturbation through adjusted noise scales can mitigate accuracy degradation commonly seen with DP-SGD.

Implications and Future Directions

The implications of this research are twofold: practically, it provides a viable means of deploying privacy-preserving machine learning models on sensitive datasets without compromising on performance. Theoretically, it broadens the application of concentrated differential privacy in iterative algorithms, contributing to the discourse on maintaining utility in privatized models.

Looking forward, there are several potential avenues for extending this work. Addressing the limitations of current differential privacy parameters' interpretability for end-users remains a critical challenge. Future work could involve exploring group-level DP guarantees, especially where multiple instances from the same user are present. Furthermore, theoretical advancements in understanding and bounding the privacy implications of data dependencies could enhance the robustness of privacy-preserving methods against sophisticated adversarial attacks.

In summary, this paper sets the stage for further exploration of differentially private deep learning, offering significant insights into safeguarding sensitive information in AI systems while maximizing their functional effectiveness.

PDF Markdown

Related Papers

Find Related Papers