Exploiting Unintended Feature Leakage in Collaborative Learning (1805.04049v3)

Published 10 May 2018 in cs.CR and cs.AI

Abstract: Collaborative machine learning and related techniques such as federated learning allow multiple participants, each with his own training dataset, to build a joint model by training locally and periodically exchanging model updates. We demonstrate that these updates leak unintended information about participants' training data and develop passive and active inference attacks to exploit this leakage. First, we show that an adversarial participant can infer the presence of exact data points -- for example, specific locations -- in others' training data (i.e., membership inference). Then, we show how this adversary can infer properties that hold only for a subset of the training data and are independent of the properties that the joint model aims to capture. For example, he can infer when a specific person first appears in the photos used to train a binary gender classifier. We evaluate our attacks on a variety of tasks, datasets, and learning configurations, analyze their limitations, and discuss possible defenses.

Citations (1,354)

View on Semantic Scholar

Summary

The paper demonstrates that analyzing gradient updates enables high-precision membership inference with precision up to 0.99 on sensitive data.
It reveals that models inadvertently learn and leak unrelated data features, achieving perfect AUC scores in property inference attacks.
It shows that incorporating an adversarial loss in multi-task learning actively amplifies feature leakage, highlighting the need for robust privacy defenses.

Exploiting Unintended Feature Leakage in Collaborative Learning

This paper provides a comprehensive analysis of the privacy risks associated with collaborative and federated learning techniques. The authors present both passive and active inference attacks that exploit unintended information leakage via model updates in such distributed machine learning frameworks.

Core Contributions

The paper addresses three primary concerns:

Membership Inference: The ability of an adversary to determine if a specific data point was part of the training data set.
Property Inference: The ability of an adversary to infer properties of the training data, even if these properties are unrelated to the primary learning task of the model.
Temporal Inference: The ability to deduce when certain properties appear or disappear in the training data over the course of the collaborative training process.

Methodology and Results

Inference Attacks

Membership Inference: Through analysis of gradient updates, the authors demonstrate that an adversary can achieve high precision (up to 0.99) in deducing the presence of specific data points. This is particularly concerning for sensitive data sets like Yelp-health reviews and FourSquare check-in locations, where identifying specific records could compromise personal privacy.
Property Inference: The paper shows that deep learning models often learn features that are unrelated to the primary classification task but are nonetheless present in the training data. For instance, when training a gender classifier, it is possible to infer extraneous features such as the presence of glasses. The property inference attacks achieve high AUC scores (up to 1.0), indicating that the learned models leak significant auxiliary information. This has profound implications for the privacy of the training data, as properties unrelated to the model's primary function should not be inferable.
Active Inference: By introducing a joint loss that balances the main task and the property inference task, an adversarial participant can actively manipulate the shared model to make it easier to infer the targeted property. This multi-task learning approach effectively increases the leakage of information about the unintended features, as demonstrated by higher separability in feature space when the adversarial loss is included in the training process.

Practical Implications

Data Privacy: The demonstrated attacks reveal significant privacy risks in scenarios where multiple entities cooperate to train shared models, especially when sensitive personal data is involved. This is crucial for applications in healthcare, finance, and any domain where data privacy is paramount.
Model Robustness: The leakage of unintended features underscores a need for developing robust defenses in collaborative learning. This includes methods for ensuring that the learned features strictly pertain to the designated learning task and do not inadvertently capture auxiliary information.

Limitations and Future Directions

The attacks' performance degrades as the number of participants in collaborative learning increases. Furthermore, attribution of inferred properties in multi-party settings remains challenging without additional contextual information. These limitations suggest that future work should focus on enhancing scalability and reducing the adversary’s ability to draw inferences in larger collaborative settings.

Moreover, the paper evaluates several potential defenses, such as selective gradient sharing, dimensionality reduction, and dropout, but finds that these are insufficient to fully thwart the described attacks. This highlights an avenue for future research to develop more effective privacy-preserving techniques.

Conclusion

The findings of this paper bring to light inherent vulnerabilities in collaborative learning frameworks. They prompt significant reconsideration of privacy and security measures necessary to protect sensitive data in distributed machine learning environments. Future research should aim to develop models that adhere to a principle of least privilege, learning only what is necessary for the task at hand, and to explore scalable participant-level differential privacy mechanisms tailored to such collaborative settings.

PDF Markdown