Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Red Teaming Visual Language Models (2401.12915v1)

Published 23 Jan 2024 in cs.AI, cs.CL, and cs.CV

Abstract: VLMs (Vision-LLMs) extend the capabilities of LLMs to accept multimodal inputs. Since it has been verified that LLMs can be induced to generate harmful or inaccurate content through specific test cases (termed as Red Teaming), how VLMs perform in similar scenarios, especially with their combination of textual and visual inputs, remains a question. To explore this problem, we present a novel red teaming dataset RTVLM, which encompasses 10 subtasks (e.g., image misleading, multi-modal jail-breaking, face fairness, etc) under 4 primary aspects (faithfulness, privacy, safety, fairness). Our RTVLM is the first red-teaming dataset to benchmark current VLMs in terms of these 4 different aspects. Detailed analysis shows that 10 prominent open-sourced VLMs struggle with the red teaming in different degrees and have up to 31% performance gap with GPT-4V. Additionally, we simply apply red teaming alignment to LLaVA-v1.5 with Supervised Fine-tuning (SFT) using RTVLM, and this bolsters the models' performance with 10% in RTVLM test set, 13% in MM-Hal, and without noticeable decline in MM-Bench, overpassing other LLaVA-based models with regular alignment data. This reveals that current open-sourced VLMs still lack red teaming alignment. Our code and datasets will be open-source.

Summary of the Red Teaming Visual LLMs Study

Introduction

The emergence of Vision-LLMs (VLMs), which combine the textual and visual processing capabilities of LLMs, has broadened the spectrum of AI applications. Despite the evident progress in VLMs, the lack of systematic red teaming benchmarks prompted the introduction of the Red Teaming Visual LLM (RTVLM) dataset. This newly constructed dataset assesses VLMs in areas crucial for secure deployment: Faithfulness, Safety, Privacy, and Fairness.

RTVLM Dataset Construction

RTVLM includes ten subtasks, each designed to target specific vulnerabilities within VLMs. The dataset ensures novelty by using images generated via diffusion techniques and human-annotated or GPT-4 generated questions. In evaluating Faithfulness, the dataset includes text and visual misleading tasks, and image order processing; Privacy is assessed through the distinction between public figures and private individuals, while Safety tests model responses to ethically risky inputs. For Fairness, VLMs are evaluated for bias towards individuals of varying races and genders.

Experimental Results

Upon evaluation, it was found that VLMs exhibit performance gaps in red teaming tasks and often lack red teaming alignment. The dataset served to benchmark and perform detailed analysis on 10 prominent VLMs, highlighting up to a 31% performance gap with GPT-4V. Incorporating RTVLM for Supervised Fine-tuning (SFT) into models like LLaVA-v1.5 improved performance significantly on the RTVLM test set and related benchmarks without degrading general performance, suggesting the necessity of incorporating red teaming alignment in the training process.

Red Teaming Alignment and Conclusions

The paper elucidates that current alignment practices in VLMs are insufficient when encountering red teaming scenarios. It also empirically demonstrates that directly aligning models with RTVLM improves both the safety and robustness of model outputs. The paper concludes by underscoring the importance of VLM security, and the RTVLM dataset is posited as a valuable asset for advancing model security measures.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Flamingo: a visual language model for few-shot learning. ArXiv, abs/2204.14198.
  2. Openflamingo: An open-source framework for training large autoregressive vision-language models. arXiv preprint arXiv:2308.01390.
  3. Qwen-vl: A frontier large vision-language model with versatile abilities. arXiv preprint arXiv:2308.12966.
  4. Introducing our multimodal models.
  5. The secret sharer: Evaluating and testing unintended memorization in neural networks. In Proceedings of the 28th USENIX Conference on Security Symposium, pages 267–284. USENIX Association.
  6. Sharegpt4v: Improving large multi-modal models with better captions.
  7. Pali-x: On scaling up a multilingual vision and language model. ArXiv, abs/2305.18565.
  8. Can language models be instructed to protect personal information?
  9. Instructblip: Towards general-purpose vision-language models with instruction tuning. ArXiv preprint, abs/2305.06500.
  10. A survey for in-context learning.
  11. Bias and fairness in large language models: A survey.
  12. Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned. arXiv preprint arXiv:2209.07858.
  13. Chatgpt outperforms crowd-workers for text-annotation tasks. ArXiv preprint, abs/2303.15056.
  14. Lora: Low-rank adaptation of large language models. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022.
  15. Reducing sentiment bias in language models via counterfactual evaluation. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 65–83.
  16. A hierarchical approach for generating descriptive image paragraphs.
  17. Obelics: An open web-scale filtered dataset of interleaved image-text documents.
  18. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. ArXiv preprint, abs/2301.12597.
  19. Silkie: Preference distillation for large visual language models.
  20. M33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPTIT: A large-scale dataset towards multi-modal multilingual instruction tuning. ArXiv preprint, abs/2306.04387.
  21. Truthfulqa: Measuring how models mimic human falsehoods.
  22. Aligning large multi-modal model with robust instruction tuning. arXiv preprint arXiv:2306.14565.
  23. Improved baselines with visual instruction tuning.
  24. Visual instruction tuning. ArXiv preprint, abs/2304.08485.
  25. Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV).
  26. Stable bias: Analyzing societal representations in diffusion models.
  27. OpenAI. 2023. Gpt-4v(ision) system card.
  28. Instruction tuning with gpt-4. ArXiv preprint, abs/2304.03277.
  29. Red teaming language models with language models. arXiv preprint arXiv:2202.03286.
  30. True few-shot learning with language models. arXiv.
  31. Visual adversarial examples jailbreak aligned large language models. In The Second Workshop on New Frontiers in Adversarial Machine Learning.
  32. Direct preference optimization: Your language model is secretly a reward model. In Thirty-seventh Conference on Neural Information Processing Systems.
  33. Aligning large multimodal models with factually augmented rlhf.
  34. Knowledge mining with scene text for fine-grained recognition. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4614–4623.
  35. Large language models are not fair evaluators.
  36. Self-instruct: Aligning language models with self-generated instructions.
  37. Gpt-4v(ision) as a generalist evaluator for vision-language tasks.
  38. Llavar: Enhanced visual instruction tuning for text-rich image understanding.
  39. Mmicl: Empowering vision-language model with multi-modal in-context learning. arXiv preprint arXiv:2309.07915.
  40. Mquake: Assessing knowledge editing in language models via multi-hop questions.
  41. Minigpt-4: Enhancing vision-language understanding with advanced large language models. ArXiv preprint, abs/2304.10592.
  42. Universal and transferable adversarial attacks on aligned language models.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Mukai Li (17 papers)
  2. Lei Li (1293 papers)
  3. Yuwei Yin (21 papers)
  4. Masood Ahmed (1 paper)
  5. Zhenguang Liu (55 papers)
  6. Qi Liu (485 papers)
Citations (20)