Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Machine Unlearning for Image-to-Image Generative Models (2402.00351v2)

Published 1 Feb 2024 in cs.LG and cs.CV

Abstract: Machine unlearning has emerged as a new paradigm to deliberately forget data samples from a given model in order to adhere to stringent regulations. However, existing machine unlearning methods have been primarily focused on classification models, leaving the landscape of unlearning for generative models relatively unexplored. This paper serves as a bridge, addressing the gap by providing a unifying framework of machine unlearning for image-to-image generative models. Within this framework, we propose a computationally-efficient algorithm, underpinned by rigorous theoretical analysis, that demonstrates negligible performance degradation on the retain samples, while effectively removing the information from the forget samples. Empirical studies on two large-scale datasets, ImageNet-1K and Places-365, further show that our algorithm does not rely on the availability of the retain samples, which further complies with data retention policy. To our best knowledge, this work is the first that represents systemic, theoretical, empirical explorations of machine unlearning specifically tailored for image-to-image generative models. Our code is available at https://github.com/jpmorganchase/l2l-generator-unlearning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (77)
  1. Wasserstein generative adversarial networks. In International Conference on Machine Learning, pp.  214–223. PMLR, 2017.
  2. Gradient surgery for one-shot unlearning on generative model. CoRR, abs/2307.04550, 2023.
  3. Mutual information neural estimation. In International Conference on Machine Learning, pp.  531–540. PMLR, 2018.
  4. On the opportunities and risks of foundation models. CoRR, abs/2108.07258, 2021.
  5. Machine unlearning. In 42nd IEEE Symposium on Security and Privacy, SP 2021, San Francisco, CA, USA, 24-27 May 2021, pp.  141–159. IEEE, 2021.
  6. Large scale GAN training for high fidelity natural image synthesis. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019.
  7. Sparks of artificial general intelligence: Early experiments with GPT-4. CoRR, abs/2303.12712, 2023.
  8. To learn image super-resolution, use a gan to learn how to do image degradation first. In Proceedings of the European Conference on Computer Vision (ECCV), pp.  185–200, 2018.
  9. Extracting training data from diffusion models. In 32nd USENIX Security Symposium (USENIX Security 23), pp.  5253–5270, 2023.
  10. Maskgit: Masked generative image transformer. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pp.  11305–11315. IEEE, 2022.
  11. Boundary unlearning: Rapid forgetting of deep networks via shifting the decision boundary. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24, 2023, pp.  7766–7775. IEEE, 2023.
  12. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In Advances in Neural Information Processing Systems, pp.  2172–2180, 2016.
  13. Forget unlearning: Towards true data-deletion in machine learning. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (eds.), International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp.  6028–6073. PMLR, 2023.
  14. Zero-shot machine unlearning. IEEE Trans. Inf. Forensics Secur., 18:2345–2354, 2023.
  15. Japan Congress. Act on the protection of personal information, 2022a. URL https://www.ppc.go.jp/files/pdf/280222_amendedlaw.pdf.
  16. United States Congress. American data privacy and protection act, 2022b. URL https://www.congress.gov/bill/117th-congress/house-bill/8152.
  17. Elements of Information Theory, chapter 12, pp.  409–413. Wiley, 2012. ISBN 9781118585771.
  18. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 20-25 June 2009, Miami, Florida, USA, pp.  248–255. IEEE Computer Society, 2009.
  19. Diffusion models beat gans on image synthesis. In Advances in Neural Information Processing Systems, pp.  8780–8794, 2021.
  20. An image is worth 16x16 words: Transformers for image recognition at scale. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021.
  21. Masked autoencoders as spatiotemporal learners. In Advances in Neural Information Processing Systems, volume 35, pp.  35946–35958, 2022.
  22. Erasing concepts from diffusion models. CoRR, abs/2303.07345, 2023.
  23. Eternal sunshine of the spotless net: Selective forgetting in deep networks. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pp.  9301–9309. Computer Vision Foundation / IEEE, 2020a.
  24. Forgetting outside the box: Scrubbing deep networks of information accessible from input-output observations. In Proceedings of the European Conference on Computer Vision (ECCV), pp.  383–398. Springer, 2020b.
  25. Generative adversarial networks. CoRR, abs/1406.2661, 2014.
  26. Amnesiac machine learning. In Proceedings of the AAAI Conference on Artificial Intelligence, pp.  11516–11524, 2021.
  27. Improved training of wasserstein gans. In Advances in Neural Information Processing Systems, pp.  5767–5777, 2017.
  28. Federated unlearning: How to efficiently erase a client in fl? CoRR, abs/2207.05521, 2022.
  29. Masked autoencoders are scalable vision learners. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pp.  15979–15988. IEEE, 2022.
  30. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems, pp.  6626–6637, 2017.
  31. Denoising diffusion probabilistic models. In Advances in neural information processing systems, pp.  6840–6851, 2020.
  32. Autoregressive diffusion models. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022.
  33. Model sparsification can simplify machine unlearning. CoRR, abs/2304.04934, 2023.
  34. A style-based generator architecture for generative adversarial networks. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pp.  4401–4410. Computer Vision Foundation / IEEE, 2019.
  35. Analyzing and improving the image quality of stylegan. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pp.  8107–8116. Computer Vision Foundation / IEEE, 2020.
  36. An introduction to variational autoencoders. Found. Trends Mach. Learn., 12(4):307–392, 2019.
  37. A mutual information maximization perspective of language representation learning. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020.
  38. Data redaction from conditional generative models. CoRR, abs/2305.11351, 2023.
  39. Boundless: Generative adversarial networks for image extension. In 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019, pp.  10520–10529. IEEE, 2019.
  40. Towards improving privacy of synthetic datasets. In Privacy Technologies and Policy - 9th Annual Privacy Forum, APF 2021, Oslo, Norway, June 17-18, 2021, Proceedings, volume 12703 of Lecture Notes in Computer Science, pp.  106–119. Springer, 2021.
  41. Towards unbounded machine unlearning. CoRR, abs/2302.09880, 2023.
  42. MAGE: masked generative encoder to unify representation learning and image synthesis. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24, 2023, pp.  2142–2152. IEEE, 2023.
  43. Analyzing leakage of personally identifiable information in language models. In 44th IEEE Symposium on Security and Privacy, SP 2023, San Francisco, CA, USA, May 21-25, 2023, pp.  346–363. IEEE, 2023.
  44. Feature unlearning for generative models via implicit feedback. CoRR, abs/2303.05699, 2023.
  45. Descent-to-delete: Gradient-based methods for machine unlearning. In Algorithmic Learning Theory, 16-19 March 2021, Virtual Conference, Worldwide, volume 132 of Proceedings of Machine Learning Research, pp.  931–962. PMLR, 2021.
  46. A survey of machine unlearning. CoRR, abs/2209.02299, 2022.
  47. Canada Parliament. The personal information protection and electronic documents act (pipeda), 2019. URL https://laws-lois.justice.gc.ca/PDF/P-8.6.pdf.
  48. Regulation (eu) 2016/679 of the european parliament and of the council of 27 april 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing directive 95/46/ec (general data protection regulation) (text with eea relevance), 2016. URL https://eur-lex.europa.eu/eli/reg/2016/679/oj.
  49. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems, pp.  8024–8035, 2019.
  50. On variational bounds of mutual information. In International Conference on Machine Learning, pp.  5171–5180. PMLR, 2019.
  51. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, pp.  8748–8763. PMLR, 2021.
  52. High-resolution image synthesis with latent diffusion models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pp.  10674–10685. IEEE, 2022.
  53. Palette: Image-to-image diffusion models. In SIGGRAPH ’22: Special Interest Group on Computer Graphics and Interactive Techniques Conference, Vancouver, BC, Canada, August 7 - 11, 2022, pp.  15:1–15:10. ACM, 2022a.
  54. Photorealistic text-to-image diffusion models with deep language understanding. In Advances in Neural Information Processing Systems, pp.  36479–36494, 2022b.
  55. Progressive distillation for fast sampling of diffusion models. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022.
  56. Improved techniques for training gans. In Advances in Neural Information Processing Systems, pp.  2226–2234, 2016.
  57. LAION-5B: an open large-scale dataset for training next generation image-text models. In Advances in Neural Information Processing Systems, 2022.
  58. Diffusion art or digital forgery? investigating data replication in diffusion models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24, 2023, pp.  6048–6058. IEEE, 2023.
  59. Solving inverse problems with latent diffusion models via hard data consistency. CoRR, abs/2307.08123, 2023.
  60. Improved techniques for training score-based generative models. In Advances in Neural Information Processing Systems, pp.  12438–12448, 2020.
  61. Generative adversarial networks unlearning. CoRR, abs/2308.09881, 2023.
  62. Fast yet effective machine unlearning. IEEE Transactions on Neural Networks and Learning Systems, 2023a.
  63. Deep regression unlearning. In International Conference on Machine Learning, pp.  33921–33939. PMLR, 2023b.
  64. Memorization without overfitting: Analyzing the training dynamics of large language models. In Advances in Neural Information Processing Systems, pp.  38274–38290, 2022.
  65. Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. In Advances in Neural Information Processing Systems, volume 35, pp.  10078–10093, 2022.
  66. EDICT: exact diffusion inversion via coupled transformations. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24, 2023, pp.  22532–22541. IEEE, 2023.
  67. Machine unlearning of features and labels. In 30th Annual Network and Distributed System Security Symposium, NDSS 2023, San Diego, California, USA, February 27 - March 3, 2023. The Internet Society, 2023.
  68. On mutual information in contrastive learning for visual representations. arXiv preprint arXiv:2005.13149, 2020.
  69. On the quantitative analysis of decoder-based generative models. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, 2017.
  70. GAN inversion: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(3):3121–3138, 2023.
  71. Machine unlearning: A survey. ACM Comput. Surv., 56(1), aug 2023.
  72. Diffusion models: A comprehensive survey of methods and applications. CoRR, abs/2209.00796, 2022.
  73. Adding conditional control to text-to-image diffusion models. CoRR, abs/2302.05543, 2023.
  74. Machine unlearning methodology based on stochastic teacher network. In Advanced Data Mining and Applications - 19th International Conference, ADMA 2023, Shenyang, China, August 21-23, 2023, Proceedings, Part V, volume 14180 of Lecture Notes in Computer Science, pp.  250–261. Springer, 2023a.
  75. Machine unlearning by reversing the continual learning. Applied Sciences, 13(16):9341, 2023b.
  76. Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(6):1452–1464, 2017. URL https://github.com/CSAILVision/places365.
  77. Unpaired image-to-image translation using cycle-consistent adversarial networks. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp.  2242–2251. IEEE Computer Society, 2017.
Citations (15)

Summary

  • The paper presents a novel algorithm that efficiently unlearns specific data from I2I generative models without retraining.
  • It leverages diffusion models, VQ-GAN, and MAE architectures to ensure retention of performance on non-forgotten data.
  • Empirical evaluations on ImageNet-1K and Places-365 validate the framework’s robustness and theoretical optimality.

Machine Unlearning in Image-to-Image Generative Models

Introduction to Machine Unlearning

In the context of machine learning, the notion of data privacy has garnered significant attention with the advent of legal frameworks aiming to protect individual privacy rights. These frameworks often encompass the 'Right to be Forgotten,' empowering individuals to request the deletion of their data. The challenge extends to models trained on this data, particularly since retraining from scratch is resource-intensive and slow. In response, the concept of 'machine unlearning' has been proposed to remove specific data from trained models while maintaining overall model integrity.

Unlearning for Generative Models

Existing machine unlearning protocols largely cater to classification tasks. But there's an increasing imperative to extend this capability to generative models such as Image-to-Image (I2I) models, which are known for their strong data memorization traits. This research systematically addresses this gap by designing a dedicated machine unlearning framework for I2I generative models, including diffusion models, VQ-GAN, and MAE architectures. The framework operates by theoretically ensuring the retention of model performance for non-removed data and the complete dissociation of the model from the 'forgotten' data samples.

Theoretical Contributions and Empirical Validations

This work contributes a theoretically-grounded, computationally efficient algorithm for unlearning in I2I generative models. The algorithm is justified through extensive theoretical analysis, demonstrating its unique optimality and negligible impact on performance for retained data. Empirical evaluations were conducted across prominent datasets like ImageNet-1K and Places-365, underscoring the robustness of the framework. Notably, the algorithm's effectiveness is not dependent on having access to the exact retained samples, a key factor when compliance with data retention policies is concerned.

Practical Implications and Future Directions

The findings provide an avenue to unlearn specific data from I2I generative models without the need to retrain or access retained data samples, hence offering practical benefits in regulatory compliance. As the first comprehensive exploration in this domain, the research opens new pathways for future studies in unlearning across different types of generative models, addressing dependencies on forget samples, and devising benchmarks for content control and privacy protection in AI-generated material. Despite these advancements, the application breadth is still limited to specific I2I models, and future work is required to generalize these concepts across further modalities and more extensive real-world applications.

Github Logo Streamline Icon: https://streamlinehq.com