Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Red-Teaming Segment Anything Model (2404.02067v1)

Published 2 Apr 2024 in cs.CV, cs.AI, and cs.LG

Abstract: Foundation models have emerged as pivotal tools, tackling many complex tasks through pre-training on vast datasets and subsequent fine-tuning for specific applications. The Segment Anything Model is one of the first and most well-known foundation models for computer vision segmentation tasks. This work presents a multi-faceted red-teaming analysis that tests the Segment Anything Model against challenging tasks: (1) We analyze the impact of style transfer on segmentation masks, demonstrating that applying adverse weather conditions and raindrops to dashboard images of city roads significantly distorts generated masks. (2) We focus on assessing whether the model can be used for attacks on privacy, such as recognizing celebrities' faces, and show that the model possesses some undesired knowledge in this task. (3) Finally, we check how robust the model is to adversarial attacks on segmentation masks under text prompts. We not only show the effectiveness of popular white-box attacks and resistance to black-box attacks but also introduce a novel approach - Focused Iterative Gradient Attack (FIGA) that combines white-box approaches to construct an efficient attack resulting in a smaller number of modified pixels. All of our testing methods and analyses indicate a need for enhanced safety measures in foundation models for image segmentation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Discrete cosine transform. IEEE Transactions on Computers, C-23(1):90–93, 1974.
  2. Optuna: A next-generation hyperparameter optimization framework. In KDD, 2019.
  3. Focus! rating xai methods and finding biases. In FUZZ-IEEE, 2022.
  4. Language models are few-shot learners. In NeurIPS, 2020.
  5. Ensemble-based blackbox attacks on dense prediction. In CVPR, 2023.
  6. The cityscapes dataset for semantic urban scene understanding. In CVPR, 2016.
  7. Robust semantic segmentation: Strong adversarial attacks and fast training of robust models, 2023.
  8. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021.
  9. Adaptive testing of computer vision models. In ICCV, 2023.
  10. Neuron shapley: Discovering the responsible neurons. Advances in neural information processing systems, 33:5922–5932, 2020.
  11. Generative adversarial networks. Communications of the ACM, 63(11):139–144, 2020.
  12. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2015.
  13. Simple black-box adversarial attacks. In ICML, 2019.
  14. Segment anything meets universal adversarial perturbation. arXiv preprint arXiv:2310.12431, 2023.
  15. LoRA: Low-rank adaptation of large language models. In ICLR, 2022.
  16. On the robustness of segment anything. arXiv preprint arXiv:2305.16220, 2023.
  17. Segment anything. In ICCV, 2023.
  18. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2022.
  19. Grounding DINO: Marrying DINO with grounded pre-training for open-set object detection. arXiv preprint arXiv:2303.05499, 2023.
  20. Deep learning face attributes in the wild. In ICCV, 2015.
  21. Segment anything in medical images. Nature Communications, 15(1):654, 2024a.
  22. Unsupervised semantic segmentation of high-resolution UAV imagery for road scene parsing. arXiv preprint arXiv:2402.02985, 2024b.
  23. Towards deep learning models resistant to adversarial attacks. In ICLR, 2018.
  24. Luca Medeiros. Language segment-anything. GitHub repository, 2023. https://github.com/luca-medeiros/lang-segment-anything.
  25. Universal adversarial perturbations. In CVPR, 2017.
  26. Multi-weather city: Adverse weather stacking for autonomous driving. In ICCV, 2021.
  27. Red teaming language models with language models. In EMNLP, 2022.
  28. Robustness of SAM: Segment anything under corruptions and beyond. arXiv preprint arXiv:2306.07713, 2023.
  29. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  30. Learning transferable visual models from natural language supervision. In ICML, 2021.
  31. Towards better understanding attribution methods. In CVPR, 2022.
  32. Generalized intersection over union: A metric and a loss for bounding box regression. In CVPR, 2019.
  33. High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
  34. Glaze: Protecting artists from style mimicry by {{\{{Text-to-Image}}\}} models. In USENIX Security, 2023.
  35. Robustness of segment anything model (SAM) for autonomous driving in adverse weather conditions. arXiv preprint arXiv:2306.13290, 2023.
  36. Intriguing properties of neural networks. In ICLR, 2014.
  37. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
  38. A survey on the robustness of computer vision models against common corruptions, 2024.
  39. An empirical study on the robustness of the segment anything model (SAM). arXiv preprint arXiv:2305.06422, 2023.
  40. Maximal jacobian-based saliency map attack. arXiv preprint arXiv:1808.07945, 2018.
  41. Medical SAM adapter: Adapting segment anything model for medical image segmentation. arXiv preprint arXiv:2304.12620, 2023.
  42. Machine unlearning: Solutions and challenges. arXiv preprint arXiv:2308.07061, 2023.
  43. SAM4UDASS: When SAM meets unsupervised domain adaptive semantic segmentation in intelligent vehicles. arXiv preprint arXiv:2401.08604, 2023.
  44. Wide residual networks. In BMVC, 2016.
  45. Understanding segment anything model: SAM is biased towards texture rather than shape. arXiv preprint arXiv:2311.11465, 2023a.
  46. Attack-SAM: Towards attacking segment anything model with adversarial examples. arXiv preprint arXiv:2305.00866, 2023b.
  47. Visual content privacy protection: A survey. arXiv preprint arXiv:2303.16552, 2023.
  48. Black-box targeted adversarial attack on segment anything (SAM). arXiv preprint arXiv:2310.10010, 2024.
  49. A foundation model for generalizable disease detection from retinal images. Nature, 622(7981):156–163, 2023.
  50. Unpaired image-to-image translation using cycle-consistent adversarial networks. In ICCV, 2017.

Summary

We haven't generated a summary for this paper yet.