One Prompt to Verify Your Models: Black-Box Text-to-Image Models Verification via Non-Transferable Adversarial Attacks (2410.22725v4)
Abstract: Recently, various types of Text-to-Image (T2I) models have emerged (such as DALL-E and Stable Diffusion), and showing their advantages in different aspects. Therefore, some third-party service platforms collect different model interfaces and provide cheaper API services and more flexibility in T2I model selections. However, this also raises a new security concern: Are these third-party services truly offering the models they claim? To answer this question, we first define the concept of T2I model verification, which aims to determine whether a black-box target model is identical to a given white-box reference T2I model. After that, we propose VerifyPrompt, which performs T2I model verification through a special designed verify prompt. Intuitionally, the verify prompt is an adversarial prompt for the target model without transferability for other models. It makes the target model generate a specific image while making other models produce entirely different images. Specifically, VerifyPrompt utilizes the Non-dominated Sorting Genetic Algorithm II (NSGA-II) to optimize the cosine similarity of a prompt's text encoding, generating verify prompts. Finally, by computing the CLIP-text similarity scores between the prompts the generated images, VerifyPrompt can determine whether the target model aligns with the reference model. Experimental results demonstrate that VerifyPrompt consistently achieves over 90\% accuracy across various T2I models, confirming its effectiveness in practical model platforms (such as Hugging Face).
- Awportrait-fl. https://huggingface.co/Shakker-Labs/AWPortrait-FL.
- Dall-mini. https://huggingface.co/spaces/dalle-mini/dalle-mini.
- Flux.1. https://huggingface.co/black-forest-labs/FLUX.1-dev.
- Playground v2. https://huggingface.co/playgroundai/playground-v2-1024px-aesthetic.
- Playground v2.5. https://huggingface.co/playgroundai/playground-v2.5-1024px-aesthetic.
- Prompthero openjourney. https://huggingface.co/prompthero/openjourney.
- Sdxl. https://huggingface.co/docs/diffusers/en/using-diffusers/sdxl.
- Stable diffusion v1.4. https://huggingface.co/CompVis/stable-diffusion-v1-4.
- Stable diffusion v2.1. https://huggingface.co/stabilityai/stable-diffusion-2-1.
- Stable diffusion v3. https://huggingface.co/stabilityai/stable-diffusion-3-medium.
- Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
- Stable video diffusion: Scaling latent video diffusion models to large datasets. arXiv preprint arXiv:2311.15127, 2023.
- Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy, 2017.
- Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. arXiv: Machine Learning, abs/1708.03999:15–26, 2017.
- A fast and elitist multiobjective genetic algorithm: Nsga-ii. In Parallel Problem Solving from Nature, 2002.
- Gpt-3: Its nature, scope, limits, and consequences. Minds and Machines, 30:681–694, 2020.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
- The survey: Text generation models in deep learning. Journal of King Saud University-Computer and Information Sciences, 34(6):2515–2528, 2022.
- Character as pixels: A controllable prompt adversarial attacking framework for black-box text guided image generation models. In IJCAI, pages 983–990, 2023.
- Hunyuan-dit: A powerful multi-resolution diffusion transformer with fine-grained chinese understanding, 2024.
- Riatig: Reliable and imperceptible adversarial text-to-image generation with natural prompts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20585–20594, 2023.
- Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2017.
- Llmmap: Fingerprinting for large language models. arXiv preprint arXiv:2407.15847, 2024.
- Hierarchical text-conditional image generation with clip latents. arXivorg, abs/2204.06125, 2022.
- Zero-shot text-to-image generation. In International conference on machine learning, pages 8821–8831. Pmlr, 2021.
- Hyper-sd: Trajectory segmented consistency model for efficient image synthesis, 2024.
- The 20 questions game to distinguish large language models. arXiv preprint arXiv:2409.10338, 2024.
- High-resolution image synthesis with latent diffusion models. Proceedings - IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2022.
- Asymmetric bias in text-to-image generation with adversarial attacks. arXiv preprint arXiv:2312.14440, 2023.
- Intriguing properties of neural networks. Computing Research Repository, abs/1312.6199, 2013.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
- Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3836–3847, 2023.
- A pilot study of query-free adversarial attack against stable diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2385–2392, 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.