Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

139 tokens/sec

GPT-4o

7 tokens/sec

Gemini 2.5 Pro Pro

46 tokens/sec

o3 Pro

4 tokens/sec

GPT-4.1 Pro

38 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

Machine Unlearning for Image-to-Image Generative Models (2402.00351v2)

Published 1 Feb 2024 in cs.LG and cs.CV

Abstract: Machine unlearning has emerged as a new paradigm to deliberately forget data samples from a given model in order to adhere to stringent regulations. However, existing machine unlearning methods have been primarily focused on classification models, leaving the landscape of unlearning for generative models relatively unexplored. This paper serves as a bridge, addressing the gap by providing a unifying framework of machine unlearning for image-to-image generative models. Within this framework, we propose a computationally-efficient algorithm, underpinned by rigorous theoretical analysis, that demonstrates negligible performance degradation on the retain samples, while effectively removing the information from the forget samples. Empirical studies on two large-scale datasets, ImageNet-1K and Places-365, further show that our algorithm does not rely on the availability of the retain samples, which further complies with data retention policy. To our best knowledge, this work is the first that represents systemic, theoretical, empirical explorations of machine unlearning specifically tailored for image-to-image generative models. Our code is available at https://github.com/jpmorganchase/l2l-generator-unlearning.

References (77)

Citations (15)

View on Semantic Scholar

Summary

The paper presents a novel algorithm that efficiently unlearns specific data from I2I generative models without retraining.
It leverages diffusion models, VQ-GAN, and MAE architectures to ensure retention of performance on non-forgotten data.
Empirical evaluations on ImageNet-1K and Places-365 validate the framework’s robustness and theoretical optimality.

Machine Unlearning in Image-to-Image Generative Models

Introduction to Machine Unlearning

In the context of machine learning, the notion of data privacy has garnered significant attention with the advent of legal frameworks aiming to protect individual privacy rights. These frameworks often encompass the 'Right to be Forgotten,' empowering individuals to request the deletion of their data. The challenge extends to models trained on this data, particularly since retraining from scratch is resource-intensive and slow. In response, the concept of 'machine unlearning' has been proposed to remove specific data from trained models while maintaining overall model integrity.

Unlearning for Generative Models

Existing machine unlearning protocols largely cater to classification tasks. But there's an increasing imperative to extend this capability to generative models such as Image-to-Image (I2I) models, which are known for their strong data memorization traits. This research systematically addresses this gap by designing a dedicated machine unlearning framework for I2I generative models, including diffusion models, VQ-GAN, and MAE architectures. The framework operates by theoretically ensuring the retention of model performance for non-removed data and the complete dissociation of the model from the 'forgotten' data samples.

Theoretical Contributions and Empirical Validations

This work contributes a theoretically-grounded, computationally efficient algorithm for unlearning in I2I generative models. The algorithm is justified through extensive theoretical analysis, demonstrating its unique optimality and negligible impact on performance for retained data. Empirical evaluations were conducted across prominent datasets like ImageNet-1K and Places-365, underscoring the robustness of the framework. Notably, the algorithm's effectiveness is not dependent on having access to the exact retained samples, a key factor when compliance with data retention policies is concerned.

Practical Implications and Future Directions

The findings provide an avenue to unlearn specific data from I2I generative models without the need to retrain or access retained data samples, hence offering practical benefits in regulatory compliance. As the first comprehensive exploration in this domain, the research opens new pathways for future studies in unlearning across different types of generative models, addressing dependencies on forget samples, and devising benchmarks for content control and privacy protection in AI-generated material. Despite these advancements, the application breadth is still limited to specific I2I models, and future work is required to generalize these concepts across further modalities and more extensive real-world applications.

GitHub

GitHub - jpmorganchase/i2i-generator-unlearning: GTAR l2l generator unlearning project (13 stars)

Tweets

https://twitter.com/_akhaliq/status/1753282091075416318

https://twitter.com/aiunlearning/status/1874511840908390785

https://twitter.com/kashifcreations/status/1753396741205041500

https://twitter.com/IAmACatAI/status/1754067276117180747

https://twitter.com/semisance/status/1753351737316974895

https://twitter.com/knishimae0531/status/1753617315869192674