Algorithms that Remember: Model Inversion Attacks and Data Protection Law (1807.04644v2)

Published 12 Jul 2018 in cs.LG, cs.CR, and cs.CY

Abstract: Many individuals are concerned about the governance of machine learning systems and the prevention of algorithmic harms. The EU's recent General Data Protection Regulation (GDPR) has been seen as a core tool for achieving better governance of this area. While the GDPR does apply to the use of models in some limited situations, most of its provisions relate to the governance of personal data, while models have traditionally been seen as intellectual property. We present recent work from the information security literature around model inversion' andmembership inference' attacks, which indicate that the process of turning training data into machine learned systems is not one-way, and demonstrate how this could lead some models to be legally classified as personal data. Taking this as a probing experiment, we explore the different rights and obligations this would trigger and their utility, and posit future directions for algorithmic governance and regulation.

Authors (3)

Michael Veale (16 papers)
Reuben Binns (35 papers)
Lilian Edwards (5 papers)

Citations (176)

View on Semantic Scholar

Summary

Analyzing Model Inversion Attacks and Data Protection

The paper "Algorithms that remember: model inversion attacks and data protection law" by Michael Veale, Reuben Binns, and Lilian Edwards explores the complex intersection between ML security vulnerabilities and data protection laws, specifically in the context of the European Union’s General Data Protection Regulation (GDPR). The authors delve into how evolving vulnerabilities in ML systems, such as model inversion and membership inference attacks, have significant implications for the classification of models as personal data, thereby potentially triggering varied rights and obligations under data protection frameworks.

Background on Data Protection and Machine Learning

Traditionally, the regulation and legal discourse surrounding machine learning models have focused primarily on the intellectual property (IP) aspects, while GDPR primarily addresses the handling of personal data. GDPR mandates extensive requirements for data controllers, including lawful bases for data processing, data subject rights, and obligations to protect the confidentiality and integrity of personal data. This regulatory framework is built on the threshold of personal identifiability, which extends to any data making an individual identifiable, even if indirectly.

Model Inversion and Membership Inference Attacks

Recent advancements in cybersecurity research are revealing that the transformation of data into computational models is not irreversible. Model inversion attacks aim to reverse-engineer models to extract information about the original training data, effectively negating assumptions about their opacity. Similarly, membership inference attacks enable adversaries to determine whether a specific individual was part of a model's training dataset. Both attack vectors illustrate a risk where models, typically considered non-personal in the data lifecycle, might act as conduits for personal data leakage.

Legal Implications

This paper suggests that these vulnerabilities necessitate a re-evaluation of the legal status of models, particularly in GDPR's context. Should models vulnerable to these attacks be classified as personal data, GDPR’s rich set of protections would inherently extend to them. Importantly, this classification would not only enforce stricter data protection obligations on data controllers but could also empower data subjects with rights to access, erase, and object to the usage of such models derived from their data.

Practical Consequences for Data Controllers and Subjects

For data subjects, if models are considered personal data, they would gain augmented rights to transparency about data flow and modeling processes. For example, they could potentially request information on model derivation and distribution or demand their exclusion and data erasure from models. However, exercising these rights presents practical challenges given the intricacies of model-derived data.

Conversely, data controllers would face increased compliance responsibilities to ensure data protection by design—embedding privacy-preserving techniques like differential privacy into model development practices. They would need to mitigate vulnerabilities preemptively or face accusations of violating security principles if data breaches occur through model exploitation.

Prospects for Governance and Regulation

While recognizing these updated legal interpretations can recalibrate power dynamics between data controllers and subjects, the authors caution against viewing this pivot as a panacea for governing ML systems. The legality tied to model privacy vulnerabilities can be seen as arbitrary if more encompassing governance frameworks do not concurrently address machine learning's transformative impact on data privacy.

The paper postulates that the intrinsic limitations of GDPR and similar frameworks in addressing dynamic, data-driven technologies emphasize the necessity for a broader rethinking of regulation paradigms. Attention to the practical and real-world enforceability of such provisions is critical in advancing effective data governance in the age of machine learning and AI. Further research into securing models and the exact scenarios wherein these attacks are feasible remains essential for the development of informed, technologically apt regulatory policies.

Related Papers

Tweets

https://twitter.com/lilianedwards/status/1870198872850477428