Hard to Forget: Poisoning Attacks on Certified Machine Unlearning

Published 17 Sep 2021 in cs.LG and cs.CR | (2109.08266v2)

Abstract: The right to erasure requires removal of a user's information from data held by organizations, with rigorous interpretations extending to downstream products such as learned models. Retraining from scratch with the particular user's data omitted fully removes its influence on the resulting model, but comes with a high computational cost. Machine "unlearning" mitigates the cost incurred by full retraining: instead, models are updated incrementally, possibly only requiring retraining when approximation errors accumulate. Rapid progress has been made towards privacy guarantees on the indistinguishability of unlearned and retrained models, but current formalisms do not place practical bounds on computation. In this paper we demonstrate how an attacker can exploit this oversight, highlighting a novel attack surface introduced by machine unlearning. We consider an attacker aiming to increase the computational cost of data removal. We derive and empirically investigate a poisoning attack on certified machine unlearning where strategically designed training data triggers complete retraining when removed.

Abstract PDF Upgrade to Chat

Citations (59)

View on Semantic Scholar

Summary

The paper demonstrates that poisoning attacks can force certified machine unlearning to require full retraining, undermining computational efficiency.
It employs a constrained optimization approach where attackers inject subtle modifications that maximize the unlearning cost without degrading benign performance.
Experiments show that even minimal poisoned data significantly raises retraining frequency, exposing a key vulnerability in current unlearning methods.

Hard to Forget: Poisoning Attacks on Certified Machine Unlearning

Introduction

This essay examines the research paper titled "Hard to Forget: Poisoning Attacks on Certified Machine Unlearning" (2109.08266). The paper addresses a critical vulnerability in machine learning systems related to the efficient removal of users' data, a process referred to as "machine unlearning." In compliance with data protection regulations like the GDPR and CCPA, organizations are compelled to erase or "unlearn" data, ensuring that trained models do not retain any influence from the removed data. Current methodologies for machine unlearning offer computational efficiency compared to retraining models from scratch. However, this paper identifies a novel attack vector—data poisoning—that exploits this unlearning process, forcing systems to resort to full retraining, which significantly increases computational costs.

Machine Unlearning and its Vulnerabilities

Machine unlearning has emerged as a solution to the computational challenges posed by fulfilling data erasure requests without fully retraining models from scratch. The paper explores certified machine unlearning methods that maintain the indistinguishability of models as if they were entirely retrained without the data in question. Certified methods ensure that the model's outputs with unlearned data are statistically indistinguishable from those retrained afresh—a crucial privacy guarantee.

However, this paper reveals a crucial oversight: these methods often lack safeguards against adversaries who manipulate the computational cost of data removal by designing strategically poisoned training data. By introducing data that triggers complete model retraining when removed, attackers can effectively undermine the computational efficiencies of machine unlearning, reverting systems to the costly full retraining process.

Attacks on Unlearning Algorithms

The attack proposed in this paper is a poisoning attack aimed at slowing down the unlearning process. The adversary controls a fraction of training data, subtly injecting modifications that significantly impact the model's computational efficiency without degrading its predictive accuracy on benign tasks. The attack is modeled following a constrained optimization formulation where the attacker finds poisoned data that maximizes the computational cost of data removal while maintaining evasion from detection mechanisms.

Different settings were tested, involving white-box scenarios where attackers have full knowledge of the model and data, and grey-box settings where model parameters are known but training data is approximatively estimated. The findings suggest that even a small amount of poisoned data can drastically reduce unlearning efficiency across various model parameters and constraints.

Experimental Insights

The experiments conducted demonstrate that poisoning can result in a significant increase in the rate of retraining. For instance, when different norms and radii constrain perturbations, results show a marked increase in retraining frequency, indicating a higher computational burden. While unlearning efficiencies are generally improved by proper parameter tuning (e.g., regularization, noise magnitude), these settings also become more susceptible to poisoning attacks.

Moreover, transferability tests confirm that attacks remain effective even when attackers train on surrogate data, highlighting an inherent vulnerability in certified unlearning algorithms when deployed in realistic scenarios with partial adversarial knowledge.

Implications and Future Directions

The vulnerability poses significant implications for both theoretical exploration and practical implementations of machine unlearning. Firstly, it challenges existing bounds on computational costs and assumptions of passive threats in machine learning. It suggests an urgent need for developing robust unlearning frameworks that account not only for privacy and model accuracy but also for adversarial robustness in computational cost terms.

Future work could focus on synthesizing defenses that detect or preemptively mitigate such selective poisoning attempts, possibly by adopting anomaly detection strategies or reinforced learning algorithms that reduce dependency on specific data subsets. Another area of investigation could be the extension of certified unlearning to broader model archetypes such as neural networks, where data and computational scales present additional complexity.

Conclusion

The paper "Hard to Forget: Poisoning Attacks on Certified Machine Unlearning" illuminates a previously underexplored domain regarding the vulnerabilities of machine unlearning systems to adversarial manipulation of computational costs. By identifying a strategic attack surface, it urges the community to rethink and fortify the structures and assurances provided by current unlearning methodologies. As data protection regulations globally tighten and the push towards privacy-ensuring technologies intensifies, addressing these vulnerabilities becomes paramount to safeguarding both user data and operational efficacy.

Markdown Report Issue