Boundary Unlearning (2303.11570v1)

Published 21 Mar 2023 in cs.CV

Abstract: The practical needs of the ``right to be forgotten'' and poisoned data removal call for efficient \textit{machine unlearning} techniques, which enable machine learning models to unlearn, or to forget a fraction of training data and its lineage. Recent studies on machine unlearning for deep neural networks (DNNs) attempt to destroy the influence of the forgetting data by scrubbing the model parameters. However, it is prohibitively expensive due to the large dimension of the parameter space. In this paper, we refocus our attention from the parameter space to the decision space of the DNN model, and propose Boundary Unlearning, a rapid yet effective way to unlearn an entire class from a trained DNN model. The key idea is to shift the decision boundary of the original DNN model to imitate the decision behavior of the model retrained from scratch. We develop two novel boundary shift methods, namely Boundary Shrink and Boundary Expanding, both of which can rapidly achieve the utility and privacy guarantees. We extensively evaluate Boundary Unlearning on CIFAR-10 and Vggface2 datasets, and the results show that Boundary Unlearning can effectively forget the forgetting class on image classification and face recognition tasks, with an expected speed-up of $17\times$ and $19\times$, respectively, compared with retraining from the scratch.

PDF Abstract

Overview of Boundary Unlearning

Boundary Unlearning is presented as an efficient approach to address the needs driven by the "right to be forgotten" and the challenges posed by data poisoning, by facilitating machine learning models to unlearn entire classes from their training data without having to retrain a model from scratch. This paper shifts focus from the traditional method of parameter scrubbing, which is computationally prohibitive given the large parameter space of deep neural networks (DNNs), and instead proposes manipulating the DNN's decision boundaries to mimic the decision behavior of models retrained from a subset of data.

Methodology

Boundary Unlearning introduces two novel methods: Boundary Shrink and Boundary Expanding. These techniques aim to adjust the decision space of a trained model to mimic the outputs of a retrained model, thereby facilitating the removal of influence from a specified set of training data:

Boundary Shrink: This method uses neighbor searching to find the nearest incorrect labels for the forgetting data—those data points being unlearned. By finetuning the model on these relabeled samples, the method breaks the decision boundary of the forgetting class while maintaining the boundary integrity among remaining classes.
Boundary Expanding: This approach involves adding a 'shadow class', finetuning the model with the forgetting data assigned to this shadow class, and then pruning this extra neuron. This disperses the influence of the forgetting samples across other classes without altering the decision boundary of the remaining classes.

Both methods aim to ensure utility and privacy by achieving a decision space that aligns closely with a model that would have been retrained without the forgetting data.

Experimental Results

The effectiveness of Boundary Unlearning was evaluated using CIFAR-10 and Vggface2 datasets, involving tasks related to image classification and face recognition. Crucially, the proposed methods demonstrated significant speed-ups—up to 17x faster compared to retraining from scratch—while effectively erasing the influence of the forgetting class.

The experiments evidenced efficacy in maintaining model accuracy for non-forgetting classes with minimal degradation, affirming the utility guarantee. Furthermore, evaluation based on attack success rate for membership inference attacks indicated that both methods closely match the retrained model's privacy guarantee, highlighting successful circumvention of privacy leakage.

Implications and Future Directions

Boundary Unlearning has pertinent implications for practical applications in privacy-sensitive environments and AI services where training data may need constant updating without retraining from scratch. It provides a scalable, resource-efficient solution that can be integrated into systems requiring compliance with privacy regulations or safeguarding against data poisoning attacks.

Prospective developments may focus on extending these techniques to other AI and ML models beyond DNNs, and further refining boundary manipulation methods to enhance efficacy in dynamic or federated learning environments. Additionally, improvements in computational efficiency and adaptability to diverse data distributions could enhance boundary unlearning's practicality across varied domains.

In conclusion, the paper contributes to the discourse on machine unlearning by providing a feasible alternative to parameter-based methods, emphasizing decision space manipulations as a viable direction for future machine unlearning strategies.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Min Chen (200 papers)
Weizhuo Gao (1 paper)
Gaoyang Liu (4 papers)
Kai Peng (29 papers)
Chen Wang (600 papers)

Citations (52)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos