Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data (1610.05755v4)

Published 18 Oct 2016 in stat.ML, cs.CR, and cs.LG

Abstract: Some machine learning applications involve training data that is sensitive, such as the medical histories of patients in a clinical trial. A model may inadvertently and implicitly store some of its training data; careful analysis of the model may therefore reveal sensitive information. To address this problem, we demonstrate a generally applicable approach to providing strong privacy guarantees for training data: Private Aggregation of Teacher Ensembles (PATE). The approach combines, in a black-box fashion, multiple models trained with disjoint datasets, such as records from different subsets of users. Because they rely directly on sensitive data, these models are not published, but instead used as "teachers" for a "student" model. The student learns to predict an output chosen by noisy voting among all of the teachers, and cannot directly access an individual teacher or the underlying data or parameters. The student's privacy properties can be understood both intuitively (since no single teacher and thus no single dataset dictates the student's training) and formally, in terms of differential privacy. These properties hold even if an adversary can not only query the student but also inspect its internal workings. Compared with previous work, the approach imposes only weak assumptions on how teachers are trained: it applies to any model, including non-convex models like DNNs. We achieve state-of-the-art privacy/utility trade-offs on MNIST and SVHN thanks to an improved privacy analysis and semi-supervised learning.

Citations (962)

View on Semantic Scholar

Summary

The paper introduces the PATE framework that aggregates teacher model outputs via a noisy voting mechanism to achieve differential privacy.
It applies semi-supervised learning techniques, including GANs, to reduce teacher queries and optimize the privacy-utility trade-off.
Empirical results on MNIST and SVHN demonstrate high accuracy with strict privacy bounds, underscoring its potential in sensitive data applications.

Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data

The paper "Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data" introduces an approach called Private Aggregation of Teacher Ensembles (PATE) to enhance privacy guarantees in machine learning tasks involving sensitive datasets. This work addresses the pressing need for privacy in scenarios like medical data analysis, where misuse of sensitive training data could lead to severe privacy breaches.

Overview of the PATE Approach

The PATE approach aims to strike a balance between the utility of machine learning models and the privacy of the training data they are based on. The core idea revolves around using multiple models trained on disjoint subsets of sensitive data, referred to as "teachers," to inform a "student" model. The student model is trained in a way that does not allow direct access to the sensitive data or the internal parameters of any single teacher model.

The teacher models make predictions on training data based on subsets of non-sensitive data, performing a noisy voting procedure to select the final label for the student models. This mechanism of aggregation through noise introduces differential privacy, ensuring that no individual data point from the sensitive datasets overly influences the student model's training. By achieving a good privacy-utility trade-off, this approach is particularly applicable to deep neural networks (DNNs) and other complex models.

Technical Details

Training the Teachers:
- Sensitive data is partitioned into several disjoint subsets, each used to train individual teacher models.
- The trained teachers predict outputs on non-sensitive data, and their predictions are aggregated by a noisy voting process to determine the label.
Aggregation Mechanism:
- The aggregation uses Laplacian noise added to the vote counts to ensure robust differential privacy guarantees.
- This approach mitigates risks associated with querying models that have unfettered access to sensitive data.
Training the Student:
- The student model is trained on non-sensitive, auxiliary data labeled using the noisy aggregation mechanism from the teacher ensemble.
- Semi-supervised learning techniques, especially using Generative Adversarial Networks (GANs), significantly reduce the number of queries to the teachers, optimizing the privacy-utility trade-off.

Numerical Results and Privacy Guarantees

The authors highlight the effectiveness of PATE through empirical evaluation on MNIST and SVHN datasets:

MNIST: Achieved an accuracy of 98.00% with a differential privacy bound of $(\varepsilon,\delta) = (2.04, 10^{-5})$ .
SVHN: Achieved an accuracy of 90.66% with a differential privacy bound of $(\varepsilon,\delta) = (8.19, 10^{-6})$ .

These results demonstrate substantial improvements over previous mechanisms, not only in accuracy but also in more stringent privacy bounds.

Implications and Future Directions

Practical Implications:

The PATE approach can be a game-changer in domains requiring stringent privacy but where models must still maintain high utility, such as healthcare, finance, and personalized user services.
Its applicability across different architectures—extending to deep learning, random forests, and other non-convex models—makes it versatile for various machine learning tasks.

Theoretical Contributions:

The paper introduces a data-dependent privacy analysis using the moments accountant technique, enhancing the rigor of differential privacy accounting.
The approach shows that significant privacy enhancements can be achieved without substantial sacrifices in model performance, paving the way for further innovations in privacy-preserving machine learning.

Future Developments:

Further research may explore the application of PATE to other complex models like RNNs and sequence-based frameworks.
Advancements may investigate even tighter privacy bounds, especially for datasets differing significantly from MNIST and SVHN in characteristics and underlying data distributions.

In conclusion, the PATE framework establishes a promising method for achieving differential privacy in machine learning models that require training on sensitive data. By combining semi-supervised learning and robust privacy accounting, PATE meets the dual objectives of data privacy and high model utility, contributing significantly to the field's ongoing quest for ethically sound machine learning practices.

PDF Markdown

Related Papers

Tweets

https://twitter.com/briandcolwell/status/1919445522282381770

YouTube

Show All Videos