Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ConceptPrune: Concept Editing in Diffusion Models via Skilled Neuron Pruning (2405.19237v1)

Published 29 May 2024 in cs.CV, cs.AI, and cs.LG

Abstract: While large-scale text-to-image diffusion models have demonstrated impressive image-generation capabilities, there are significant concerns about their potential misuse for generating unsafe content, violating copyright, and perpetuating societal biases. Recently, the text-to-image generation community has begun addressing these concerns by editing or unlearning undesired concepts from pre-trained models. However, these methods often involve data-intensive and inefficient fine-tuning or utilize various forms of token remapping, rendering them susceptible to adversarial jailbreaks. In this paper, we present a simple and effective training-free approach, ConceptPrune, wherein we first identify critical regions within pre-trained models responsible for generating undesirable concepts, thereby facilitating straightforward concept unlearning via weight pruning. Experiments across a range of concepts including artistic styles, nudity, object erasure, and gender debiasing demonstrate that target concepts can be efficiently erased by pruning a tiny fraction, approximately 0.12% of total weights, enabling multi-concept erasure and robustness against various white-box and black-box adversarial attacks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. On the pitfalls of analyzing individual neurons in language models. ICLR, 2022.
  2. Praneeth Bedapudi. Nudenet: Neural nets for nudity detection and censoring. 2022.
  3. What is the state of neural network pruning? MLSys, 2020.
  4. Coyo-700m: Image-text pair dataset. 2022.
  5. Conceptual 12M: Pushing web-scale image-text pre-training to recognize long-tail visual concepts. In CVPR, 2021.
  6. What is one grain of sand in the desert? analyzing individual neurons in deep nlp models. AAAI, 2018.
  7. Imagenet: a large-scale hierarchical image database. 2009.
  8. Discovering salient neurons in deep nlp models. JMLR, 2023.
  9. Analyzing individual neurons in pre-trained language models. EMNLP, 2020.
  10. Gemini Team et al. Gemini: A family of highly capable multimodal models. arXiv, 2024.
  11. Case no.3:2023cv00201. us district court for the northern district of california.,. 2023.
  12. Salun: Empowering machine unlearning via gradient-based weight saliency in both image classification and generation. ICLR, 2024.
  13. Structural pruning for diffusion models. NeurIPS, 2023.
  14. Camera Forensics. The dark reality of stable diffusion. 2024.
  15. The lottery ticket hypothesis: Finding sparse, trainable neural networks. ICLR, 2019.
  16. Sparsegpt: Massive language models can be accurately pruned in one-shot. ICML, 2023.
  17. Gptq: Accurate post-training quantization for generative pre-trained transformers. ICLR, 2023.
  18. Erasing concepts from diffusion models. In ICCV, 2023.
  19. Unified concept editing in diffusion models. WACV, 2023.
  20. Learning both weights and connections for efficient neural networks. NeurIPS, 2015.
  21. Deep residual learning for image recognition. CVPR, 2015.
  22. Gaussian error linear units (gelus). arXiv, 2023.
  23. Selective amnesia: A continual learning approach to forgetting in deep generative models. In NeurIPS, 2023.
  24. Denoising diffusion probabilistic models. NeurIPS, 2020.
  25. Fastai: A layered api for deep learning. Information, 11, 2020.
  26. Ablating concepts in text-to-image diffusion models. In ICCV, 2023.
  27. Optimal brain damage. In Advances in Neural Information Processing Systems, 1989.
  28. Snip: Single-shot network pruning based on connection sensitivity. ICLR, 2019.
  29. Pruning filters for efficient convnets. ICLR, 2017.
  30. Implicit concept removal of diffusion models. arXiv, 2024.
  31. Rethinking the value of network pruning. In ICLR, 2019.
  32. Stable bias: Evaluating societal representations in diffusion models. NeurIPS, 2023.
  33. Latent consistency models: Synthesizing high-resolution images with few-step inference. arXiv, 2023.
  34. Understanding deep image representations by inverting them. In CVPR, 2015.
  35. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. ICML, 2022.
  36. Editing implicit assumptions in text-to-image diffusion models. ICCV, 2023.
  37. Training language models to follow instructions with human feedback. NeurIPS, 2022.
  38. Circumventing concept erasure methods for text-to-image generative models. ICLR, 2024.
  39. Sdxl: Improving latent diffusion models for high-resolution image synthesis. ICLR, 2024.
  40. MIT Technology Review. Text-to-image ai models can be tricked into generating disturbing images. 2023.
  41. High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
  42. High-resolution image synthesis with latent diffusion models. CVPR, 2022.
  43. U-net: Convolutional networks for biomedical image segmentation. MICCAI, 2015.
  44. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. 2023.
  45. Photorealistic text-to-image diffusion models with deep language understanding. NeurIPS, 2022.
  46. Safe latent diffusion: Mitigating inappropriate degeneration in diffusion models. In CVPR, 2023.
  47. LAION-5b: An open large-scale dataset for training next generation image-text models. In NeurIPS, 2022.
  48. Noam Shazeer. Glu variants improve transformer. arXiv, 2020.
  49. Denoising diffusion implicit models. In ICLR, 2021.
  50. Finding experts in transformer models. arXiv preprint, 2020.
  51. A simple and effective pruning approach for large language models. ICLR, 2024.
  52. Ring-a-bell! how reliable are concept removal methods for diffusion models? ICLR, 2024.
  53. Finding skill neurons in pre-trained transformer-based language models. EMNLP, 2022.
  54. Assessing the brittleness of safety alignment via pruning and low-rank modifications. arXiv preprint arXiv:2402.05162, 2024.
  55. Erasediff: Erasing data influence in diffusion models. arXiv preprint, 2024.
  56. Mma-diffusion: Multimodal attack on diffusion models. CVPR, 2024.
  57. Forget-me-not: Learning to forget in text-to-image diffusion models. arXiv preprint arXiv:2211.08332, 2023.
  58. Adding conditional control to text-to-image diffusion models. In ICCV, October 2023.
  59. A survey of diffusion based image generation models: Issues and their solutions. IEEE PAMI, 2023.
  60. To generate or not? safety-driven unlearned diffusion models are still easy to generate unsafe images… for now. arXiv preprint, 2023.
  61. Moefication: Transformer feed-forward layers are mixtures of experts. ACL, 2022.
  62. Emergent modularity in pre-trained transformers. ACL, 2023.
  63. Gender bias in coreference resolution: Evaluation and debiasing methods. NAACL, 2018.
  64. Separable multi-concept erasure from diffusion models. arXiv, 2024.
Citations (7)

Summary

We haven't generated a summary for this paper yet.