A Hybrid Approach to Privacy-Preserving Federated Learning (1812.03224v2)

Published 7 Dec 2018 in cs.LG and stat.ML

Abstract: Federated learning facilitates the collaborative training of models without the sharing of raw data. However, recent attacks demonstrate that simply maintaining data locality during training processes does not provide sufficient privacy guarantees. Rather, we need a federated learning system capable of preventing inference over both the messages exchanged during training and the final trained model while ensuring the resulting model also has acceptable predictive accuracy. Existing federated learning approaches either use secure multiparty computation (SMC) which is vulnerable to inference or differential privacy which can lead to low accuracy given a large number of parties with relatively small amounts of data each. In this paper, we present an alternative approach that utilizes both differential privacy and SMC to balance these trade-offs. Combining differential privacy with secure multiparty computation enables us to reduce the growth of noise injection as the number of parties increases without sacrificing privacy while maintaining a pre-defined rate of trust. Our system is therefore a scalable approach that protects against inference threats and produces models with high accuracy. Additionally, our system can be used to train a variety of machine learning models, which we validate with experimental results on 3 different machine learning algorithms. Our experiments demonstrate that our approach out-performs state of the art solutions.

Authors (7)

Stacey Truex (14 papers)
Nathalie Baracaldo (34 papers)
Ali Anwar (65 papers)
Thomas Steinke (57 papers)
Heiko Ludwig (17 papers)
Rui Zhang (1138 papers)
Yi Zhou (438 papers)

Citations (816)

View on Semantic Scholar

Summary

A Hybrid Approach to Privacy-Preserving Federated Learning

In the evolving landscape of ML and data privacy, this paper addresses a critical challenge: enabling federated learning (FL) systems that provide robust privacy guarantees without sacrificing model accuracy. The authors, Stacey Truex et al., propose a novel hybrid approach integrating Differential Privacy (DP) and Secure Multiparty Computation (SMC) to realize privacy-preserving FL. This comprehensive solution is designed to mitigate inference attacks during training and protect the final trained model, ensuring both privacy and high predictive accuracy.

Background and Motivation

Federated Learning is a decentralized ML paradigm wherein multiple entities collaboratively train a model without centrally aggregating their datasets. This is particularly beneficial in scenarios where data sharing is restricted due to privacy laws (e.g., GDPR) or competitive concerns among participating organizations. Traditional FL approaches, while keeping data local, fail to provide sufficient privacy guarantees against sophisticated inference attacks. These can occur during both the training process (inference from the exchanged updates) and from the deployed model (membership inference attacks).

Core Contributions

The authors' key contributions are as follows:

Hybrid FL System: The integration of DP and SMC protects against both outsider and insider threats by ensuring that individual responses are obfuscated and aggregated securely.
Tunable Trust Parameter: The system introduces a customizable trust parameter, $t$ , which adjusts the noise levels based on the expected number of honest parties. This parameter allows practitioners to balance between privacy and accuracy according to specific trust assumptions.
Multi-Model Applicability: The proposed approach is validated across three different ML models—Decision Trees (DT), Convolutional Neural Networks (CNN), and Support Vector Machines (SVM)—demonstrating its versatility.
Scalability: The approach is shown to be scalable, maintaining high accuracy even with an increasing number of participants.

Technical Overview

Hybrid Differential Privacy and SMC: Differential Privacy ensures that any single data point does not significantly influence the output of the model, adding noise proportional to the sensitivity of the function being computed. This guarantees formal privacy bounds (i.e., $(\epsilon, \delta)$ -DP). However, DP's noise addition can degrade model accuracy. The novel contribution here is blending DP with SMC, particularly threshold homomorphic encryption (Paillier cryptosystem). By doing so, the authors effectively reduce the noise per query, as individual responses are protected and aggregated securely, ensuring that only large aggregated values are exposed, maintaining accurate yet private computations.

Algorithmic Implementation:

Decision Trees: The FL system splits the learning process into queries for node counts and class counts. Each query to the parties is protected with DP noise, and responses are encrypted and aggregated via SMC to minimize noise impact. The aggregator uses this aggregated, decrypted data to build the DT iteratively.
Neural Networks: Each party trains locally for one epoch and adds DP noise to the gradient updates. These noisy updates are encrypted and sent to the aggregator, which averages the parameters and redistributes the updated model for the next epoch.
Support Vector Machines: Similar to the neural network, each party trains locally, clips gradients to respect DP sensitivity bounds, adds noise, encrypts updates, and transmits them to the aggregator for secure averaging.

Experimental Results

The experimental section validates the proposed approach against several metrics:

Privacy Budget Impact: The system demonstrates robustness to different privacy budgets (varying $\epsilon$ ) by maintaining high accuracy (F1-score) across a range of values. As $\epsilon$ decreases, indicating tighter privacy guarantees, the system outperforms the local DP approach.
Scalability: It maintains high accuracy even as the number of participating parties increases, showing negligible degradation in performance when compared to other state-of-the-art privacy-preserving approaches.
Trust Parameter: Results with varying trust settings underscore the system’s flexibility, showcasing how increased trust parameters (lower collusion risk) allow for significantly reduced noise levels while preserving accuracy.

Implications and Future Directions

The implications of this hybrid approach are manifold. Practically, organizations can now deploy FL systems that ensure data privacy without compromising on model performance, facilitating broader adoption in privacy-sensitive domains like healthcare, finance, and telecommunications. Theoretically, this work advances the state-of-the-art in privacy-preserving ML by presenting a versatile framework that can be adapted to various ML models and scenarios.

Future Research Directions:

Optimizing Homomorphic Aggregation: Further research could explore alternative homomorphic encryption techniques or optimization strategies to reduce computational overhead and improve communication efficiency.
Extended Model Support: Expanding the applicability to other complex models such as Recurrent Neural Networks (RNNs) or unsupervised learning algorithms could broaden the impact.
Dynamic Trust Models: Investigating dynamic and adaptive trust models that can adjust in real-time based on observed behavior of parties could provide even more robust privacy guarantees.

In conclusion, this paper presents a technically rigorous and empirically validated approach to hybrid privacy-preserving federated learning, promising a significant advancement in secure collaborative ML.

PDF Markdown

Related Papers

Find Related Papers