UPAM: Unified Prompt Attack in Text-to-Image Generation Models Against Both Textual Filters and Visual Checkers (2405.11336v2)
Abstract: Text-to-Image (T2I) models have raised security concerns due to their potential to generate inappropriate or harmful images. In this paper, we propose UPAM, a novel framework that investigates the robustness of T2I models from the attack perspective. Unlike most existing attack methods that focus on deceiving textual defenses, UPAM aims to deceive both textual and visual defenses in T2I models. UPAM enables gradient-based optimization, offering greater effectiveness and efficiency than previous methods. Given that T2I models might not return results due to defense mechanisms, we introduce a Sphere-Probing Learning (SPL) scheme to support gradient optimization even when no results are returned. Additionally, we devise a Semantic-Enhancing Learning (SEL) scheme to finetune UPAM for generating target-aligned images. Our framework also ensures attack stealthiness. Extensive experiments demonstrate UPAM's effectiveness and efficiency.
- Leonardo.ai, access date: 9st nov. 2023.
- Midjourney, access date: 26th sept. 2023.
- Multimodal datasets: misogyny, pornography, and malignant stereotypes. arXiv preprint arXiv:2110.01963, 2021.
- Universal sentence encoder. arXiv preprint arXiv:1803.11175, 2018.
- Deep fakes: A looming challenge for privacy, democracy, and national security. Calif. L. Rev., 107:1753, 2019.
- Chien, J.-T. Source separation and machine learning. Academic Press, 2018.
- Label-only membership inference attacks. In International conference on machine learning, pp. 1964–1974. PMLR, 2021.
- Ai, governance and ethics: global perspectives. University of Hong Kong Faculty of Law Research Paper, (2020/051), 2020.
- Gradient-based optimizer (gbo): a review, theory, variants, and applications. Archives of Computational Methods in Engineering, 30(4):2431–2449, 2023.
- Discovering the hidden vocabulary of dalle-2. arXiv preprint arXiv:2206.00169, 2022.
- The logbarrier adversarial attack: making effective use of decision boundary information. In Proceedings of the IEEE/CVF international conference on computer vision, pp. 4862–4870, 2019.
- Review of the standpoints regarding the content of the article 5 of the proposal for the european union’s artificial intelligence act: The challenge of finding the balance. Univerzitetska misao-časopis za nauku, kulturu i umjetnost, Novi Pazar, (21):164–175, 2022.
- Google vizier: A service for black-box optimization. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 1487–1495, 2017.
- Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
- Towards making the most of llm for translation quality estimation. In CCF International Conference on Natural Language Processing and Chinese Computing, pp. 375–386. Springer, 2023.
- Is bert really robust? a strong baseline for natural language attack on text classification and entailment. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pp. 8018–8025, 2020.
- Label-only model inversion attacks via boundary repulsion. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 15045–15053, 2022.
- Kalyan, K. S. A survey of gpt-3 family large language models including chatgpt and gpt-4. Natural Language Processing Journal, 6:100048, 2024.
- Vilt: Vision-and-language transformer without convolution or region supervision. In International Conference on Machine Learning, pp. 5583–5594. PMLR, 2021.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pp. 740–755. Springer, 2014.
- Riatig: Reliable and imperceptible adversarial text-to-image generation with natural prompts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20585–20594, 2023.
- Backdoor defense with machine unlearning. In IEEE INFOCOM 2022-IEEE Conference on Computer Communications, pp. 280–289. IEEE, 2022.
- Hard to forget: Poisoning attacks on certified machine unlearning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp. 7691–7700, 2022.
- Millière, R. Adversarial attacks on image generation with made-up words. arXiv preprint arXiv:2208.04135, 2022.
- Blackvip: Black-box visual prompting for robust transfer learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 24224–24235, 2023.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pp. 8748–8763. PMLR, 2021.
- A review on large language models: Architectures, applications, taxonomies, open issues and challenges. Authorea Preprints, 2023.
- Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 1(2):3, 2022.
- Red-teaming the stable diffusion safety filter. arXiv preprint arXiv:2210.04610, 2022.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10684–10695, 2022.
- Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022.
- Safe latent diffusion: Mitigating inappropriate degeneration in diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22522–22531, 2023.
- Adversarial training for free! Advances in Neural Information Processing Systems, 32, 2019.
- Spall, J. Multivariate stochastic approximation using a simultaneous perturbation gradient approximation. IEEE Transactions on Automatic Control, 37(3):332–341, 1992a.
- Spall, J. C. Multivariate stochastic approximation using a simultaneous perturbation gradient approximation. IEEE transactions on automatic control, 37(3):332–341, 1992b.
- Spall, J. C. Accelerated second-order stochastic optimization using only function measurements. In Proceedings of the 36th IEEE Conference on Decision and Control, volume 2, pp. 1417–1424. IEEE, 1997a.
- Spall, J. C. A one-measurement form of simultaneous perturbation stochastic approximation. Automatica, 33(1):109–112, 1997b.
- Spall, J. C. Adaptive stochastic approximation by the simultaneous perturbation method. IEEE transactions on automatic control, 45(10):1839–1853, 2000.
- Spall, J. C. Introduction to stochastic search and optimization: estimation, simulation, and control. John Wiley & Sons, 2005.
- Stochastic gradient descent as approximate bayesian inference. Journal of Machine Learning Research, 18(134):1–35, 2017.
- The biased artist: Exploiting cultural biases via homoglyphs in text-guided image generation models. arXiv preprint arXiv:2209.08891, 2022.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
- Continual adversarial defense. arXiv preprint arXiv:2312.09481, 2023.
- Attngan: Fine-grained text to image generation with attentional generative adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1316–1324, 2018.
- Scaling autoregressive models for content-rich text-to-image generation. arXiv preprint arXiv:2206.10789, 2(3):5, 2022.
- Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In Proceedings of the IEEE international conference on computer vision, pp. 5907–5915, 2017.
- Stackgan++: Realistic image synthesis with stacked generative adversarial networks. IEEE transactions on pattern analysis and machine intelligence, 41(8):1947–1962, 2018.
- Random learning gradient based optimization for efficient design of photovoltaic models. Energy Conversion and Management, 230:113751, 2021.
- Label-only model inversion attacks: Attack with the least information. IEEE Transactions on Information Forensics and Security, 18:991–1005, 2022.
- Duo Peng (11 papers)
- Qiuhong Ke (42 papers)
- Jun Liu (606 papers)