Remember What You Want to Forget: Algorithms for Machine Unlearning (2103.03279v2)

Published 4 Mar 2021 in cs.LG and cs.AI

Abstract: We study the problem of unlearning datapoints from a learnt model. The learner first receives a dataset $S$ drawn i.i.d. from an unknown distribution, and outputs a model $\widehat{w}$ that performs well on unseen samples from the same distribution. However, at some point in the future, any training datapoint $z \in S$ can request to be unlearned, thus prompting the learner to modify its output model while still ensuring the same accuracy guarantees. We initiate a rigorous study of generalization in machine unlearning, where the goal is to perform well on previously unseen datapoints. Our focus is on both computational and storage complexity. For the setting of convex losses, we provide an unlearning algorithm that can unlearn up to $O(n/d^{1/4})$ samples, where $d$ is the problem dimension. In comparison, in general, differentially private learning (which implies unlearning) only guarantees deletion of $O(n/d^{1/2})$ samples. This demonstrates a novel separation between differential privacy and machine unlearning.

Citations (240)

View on Semantic Scholar

Summary

The paper distinguishes machine unlearning from differential privacy, enabling efficient data removal without full retraining.
The paper introduces an algorithm for convex losses that achieves a quadratic improvement in deletion capacity while preserving accuracy.
The approach optimizes storage and computation, making it practical for large-scale applications and future extensions to non-convex and online unlearning.

Algorithms for Machine Unlearning: An Overview

The paper "Remember What You Want to Forget: Algorithms for Machine Unlearning" by Ayush Sekhari, Jayadev Acharya, Gautam Kamath, and Ananda Theertha Suresh addresses the critical issue of machine unlearning. This problem arises when users request their data to be removed from a learned machine learning model, implying the need to update the model to exclude any influence from the data that must be forgotten.

Summary

The motivation for machine unlearning stems from privacy concerns and legal requirements, such as GDPR and CCPA, which mandate the deletion of user data upon request. The complexity arises from the need to maintain the model's performance on new, unseen data while efficiently removing traces of the unlearned data without the expense of full retraining.

Main Contributions

Separation from Differential Privacy: The authors establish a clear distinction between machine unlearning and differential privacy (DP). They show that machine unlearning can achieve better efficiency compared to the constraints and guarantees provided by DP methods.
Algorithm for Convex Losses: The paper introduces an unlearning algorithm specifically tailored for convex loss functions. This algorithm outperforms DP-based approaches in terms of the number of samples that can be unlearned while retaining model accuracy.
Quadratic Improvement in Deletion Capacity: Their proposed method shows a quadratic improvement in deletion capacity over DP-based methods, achieved by focusing on particular samples for unlearning rather than applying uniform noise.
Focus on Generalization: Unlike previous works that primarily focused on empirical risk minimization, this paper emphasizes the ability of machine unlearning algorithms to generalize well to unseen data.
Efficient Storage and Computation: The proposed methods ensure that storage requirements do not grow with the dataset size, making the approach practical for large-scale applications.

Implications

By providing better deletion capacities, this research suggests that more efficient machine learning models can comply with legal and ethical data deletion requests without compromising the model's performance on future samples. The focus on storage and computational efficiency highlights the practicality of the approach in real-world applications where retraining a model is resource-intensive.

Potential Developments

The paper opens avenues for further exploration in several areas:

Non-Convex Loss Functions: Extending the method to handle non-convex loss functions would broaden the applicability of machine unlearning algorithms.
Discrete Hypothesis Class: Exploring machine unlearning for finite hypothesis spaces could be beneficial, particularly in domains where models are discrete by nature.
Online Unlearning: Developing algorithms that can handle streaming data would address scenarios where data deletion requests occur continuously over time.

Conclusion

This research sets a significant precedent for developing practical, efficient machine unlearning methods that maintain model accuracy. As privacy laws become more prevalent and stringent, the demand for robust unlearning algorithms will likely grow, positioning this work at the forefront of addressing these emerging challenges in machine learning.

PDF Markdown

Remember What You Want to Forget: Algorithms for Machine Unlearning (2103.03279v2)

Summary

Algorithms for Machine Unlearning: An Overview

Summary

Main Contributions

Implications

Potential Developments

Conclusion

Related Papers