CaPC Learning: Confidential and Private Collaborative Learning (2102.05188v2)

Published 9 Feb 2021 in cs.LG and cs.CR

Abstract: Machine learning benefits from large training datasets, which may not always be possible to collect by any single entity, especially when using privacy-sensitive data. In many contexts, such as healthcare and finance, separate parties may wish to collaborate and learn from each other's data but are prevented from doing so due to privacy regulations. Some regulations prevent explicit sharing of data between parties by joining datasets in a central location (confidentiality). Others also limit implicit sharing of data, e.g., through model predictions (privacy). There is currently no method that enables machine learning in such a setting, where both confidentiality and privacy need to be preserved, to prevent both explicit and implicit sharing of data. Federated learning only provides confidentiality, not privacy, since gradients shared still contain private information. Differentially private learning assumes unreasonably large datasets. Furthermore, both of these learning paradigms produce a central model whose architecture was previously agreed upon by all parties rather than enabling collaborative learning where each party learns and improves their own local model. We introduce Confidential and Private Collaborative (CaPC) learning, the first method provably achieving both confidentiality and privacy in a collaborative setting. We leverage secure multi-party computation (MPC), homomorphic encryption (HE), and other techniques in combination with privately aggregated teacher models. We demonstrate how CaPC allows participants to collaborate without having to explicitly join their training sets or train a central model. Each party is able to improve the accuracy and fairness of their model, even in settings where each party has a model that performs well on their own dataset or when datasets are not IID and model architectures are heterogeneous across parties.

Authors (7)

Christopher A. Choquette-Choo (49 papers)
Natalie Dullerud (10 papers)
Adam Dziedzic (47 papers)
Yunxiang Zhang (22 papers)
Somesh Jha (112 papers)
Nicolas Papernot (123 papers)
Xiao Wang (507 papers)

Citations (51)

View on Semantic Scholar

Summary

Overview of CaPC Learning: Confidential and Private Collaborative Learning

The paper "CaPC Learning: Confidential and Private Collaborative Learning" introduces a novel framework that addresses the longstanding challenge of achieving both confidentiality and privacy in collaborative machine learning environments. Traditional methods like federated learning provide confidentiality yet fall short in preserving privacy, especially when sensitive data from sectors like healthcare and finance are involved. The proposed CaPC method incorporates innovations that enable multiple parties to securely collaborate without compromising the confidentiality of test data or the privacy of training data from distinct datasets.

Confidential and Private Collaborative (CaPC) Learning

CaPC presents a strategic departure from existing collaborative learning frameworks, addressing the critical limitation of requiring consensus on model architecture among participating parties and demanding large participant pools to ensure privacy. The framework uniquely facilitates confidential and private collaborative learning among parties with heterogeneous model architectures and non-i.i.d. datasets.

The crux of CaPC's innovation lies in its hybrid approach combining secure multi-party computation (MPC), homomorphic encryption (HE), and differential privacy through the private aggregation of teacher models. These techniques collectively establish a secure protocol that allows parties such as hospitals, with varied local models, to query each other for labels without directly accessing or compromising data confidentiality. The differential privacy mechanism ensures privacy by applying noise in the aggregated predictions.

Protocol Design

The protocol introduces a privacy guardian (PG) to produce semi-trusted differential privacy guarantees and secure communication between querying and answering parties:

The querying party shares an encrypted input with other parties.
The answering parties—using secure MPC and HE—perform model inference and collaboratively compute aggregate predictions while adding noise for differential privacy.
The privacy guardian ensures that aggregated results reflect differential privacy, enabling the querying party to enhance its local model utilities without compromising the underlying data's confidentiality.

The protocol demonstrates rigorous guarantees against semi-honest adversaries and ensures differential privacy by following structured application and post-processing of the predicted labels using noise-adding mechanisms similar to PATE.

Experimental Evaluation

Experiments were conducted on datasets like SVHN and CIFAR10, under both homogeneous and heterogeneous model conditions, assessing the framework's ability to improve model utility while preserving privacy:

Accuracy Gains: CaPC showed improvements in model accuracy—4.09% on CIFAR10 and up to 2.64% in heterogeneous settings on SVHN—highlighting its ability to enhance utility even for models with initially high performance.
Data Skew and Active Learning: When data skew was introduced, CaPC substantially elevated fairness metrics like balanced accuracy, especially when combined with active learning strategies which more effectively targeted underrepresented classes.

Computational Performance and Cost

While the main computational cost arises from secure MPC during inference, the overall overhead remains manageable, showing promise for scalable deployment as underlying cryptographic tools improve.

Implications and Future Directions

CaPC represents a step forward in privacy-preserving collaborative learning, offering a compelling alternative to federated learning by overcoming its typical limitations. Continued research may delve into exploring CaPC's role in fairness, especially in scenarios where sensitive attributes drive discrimination. Moreover, its adaptation to settings with fewer physical parties through virtual models marks a promising avenue for expanding its applicability.

The paper sheds light on pathways for secure collaborative models, advancing both the theoretical understanding and practical application of privacy-preserving techniques in sensitive industries. Speculation directs attention to how such frameworks might integrate further into the evolution of decentralized AI systems, adding layers of security essential for collaborative environments.

Related Papers

YouTube

Show All Videos